Computing Theory-Comp303 Bestnotes

http://www.seas.upenn.edu/~cit596/notes/dave/fundamentals.
html
overview
Sets
 Importance: languages are sets
 A set is a collection of "things," called the elements or members of the set. It is
essential to have a criterion for determining, for any given thing, whether it is
or is not a member of the given set. This criterion is called themembership
criterion of the set.
 There are two common ways of indicating the members of a set:
o List all the elements, e.g. {a, e, i, o, u}
o Provide some sort of an algorithm or rule, such as a grammar
 Notation:
o To indicate that x is a member of set S, we write x S
o We denote the empty set (the set with no members) as {} or
o If every element of set A is also an element of set B, we say that A is
a subset of B, and write A B
o If every element of set A is also an element of set B, but B also has some
elements not contained in A, we say that A is a proper subset of B, and
write A B
Operations on Sets
 The union of sets A and B, written A B, is a set that contains everything that
is in A, or in B, or in both.
 The intersection of sets A and B, written A B, is a set that contains exactly
those elements that are in both A and B.
 The set difference of set A and set B, written A - B, is a set that contains
everything that is in A but not in B.
 The complement of a set A, written as -A or (better) A with a bar drawn over it,
is the set containing everything that is not in A. This is almost always used in
the context of some universal set U that contains "everything" (meaning
"everything we are interested in at the moment"). Then -A is shorthand for U -
A.
Additional terminology
The cardinality of a set A, written |A|, is the number of elements in a set A.
The powerset of a set Q, written 2 , is the set of all subsets of Q. The notation

suggests the fact that a set containing n elements has a powerset containing 2
elements.
Two sets are disjoint if they have no elements in common, that is, if A B = .
Relations and Functions

 Importance: need basic familiarity with the terminology
 A relation on sets S and T is a set of ordered pairs (s, t), where
o s S (s is a member of S),
o t T,
o S and T need not be different,
o The set of all first elements (s) is the domain of the relation, and
o The set of all second elements is the range of the relation.
 A relation is a function iff every element of S occurs once and only once as a
first element of the relation.
 A relation is a partial function iff every element of S occurs at most once as an
element of the relation.
Graphs
 Importance: Automata are graphs.
 A graph consists of two sets
o A set V of vertices (or nodes), and
o A set E of edges (or arcs).
 An edge consists of a pair of vertices in V. If the edges are ordered, the graph is
a digraph (a contraction of "directed graph").
 A walk is a sequence of edges, where the finish vertex of each edge is the start
vertex of the next edge. Example: (a, e), (e, i), (i, o), (o, u).
 A path is a walk with no repeated edges.
 A simple path is a path with no repeated vertices.
Trees
 Importance: Trees are used in some algorithms.
 A tree is a kind of digraph:
o It has one distinguished vertex called the root;
o There is exactly one path from the root to each vertex; and
o The level of a vertex is the length of the path to it from the root.
 Terminology:
o if there is an edge from A to B, then A is the parent of B, and B is
the child of A.
o A leaf is a node with no children.
o The height of a tree is the largest level number of any vertex.
Proof techniques
Importance
 Because this is a formal subject, the textbook is full of proofs
 Proofs are encapsulated understanding
 You may be asked to learn a very few important proofs
Proof by induction
 Prove something about P1 (the basis)
 Prove that if it is true for Pn, then it is true for Pn+1 (the inductive assumption)
 Conclude that it is true for all P
Proof by contradiction (also called reductio ad absurdum)

 Assume some fact P is false
 Show that this leads to a contradiction
 Conclude P must be true
BFN
BNF Notation
 Importance
o standard way to define programming language syntax
o good background for later part of course

 Stands for either "Backus Normal Form" or "Backus-Naur Form"
 Includes two sets:
o terminals are symbols in the language being defined, e.g. while, for, =, [.
o nonterminals are used to talk about the language, and are enclosed in
angle brackets, e.g. <assignment statement>, <declaration>.
 There are only two metasymbols
o ::= means "is defined as"

o | means "or"
 BNF uses recursion heavily
 BNF examples
 <loop statement> ::= <while loop> | <for loop>

 <while loop> ::= while ( <condition> ) <statement>

 <for loop> ::= for ( <expression> ; <expression>;
<expression> ) <statement>

 <assignment statement> ::=
 <variable> = <expression>

 Recursion is used frequently:
 <digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

 <integer> ::= <digit> | <integer> <digit>

 <letter> ::= <lowercase letter> | <uppercase letter>

 <lowercase letter> ::=
 a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

 <name> ::= <letter> | <name> <letter>
 | <name> <digit>

 It's hard to set limits:
 <small number> ::= <digit> | <digit> <digit>
 | <digit> <digit> <digit>
 | <digit> <digit> <digit> <digit>
 | <digit> <digit> <digit> <digit> <digit>
More BNF Examples
Use recursion to build lists:
<statement list> ::= <statement>
| <statement list> <statement>
You can invent new nonterminals:

<set> ::= { <element list> }
<element list> ::=

<empty> | <nonempty element list>
<nonempty element list> ::= <element>

| <nonempty element list> , <element>
Here's how you define the null string:

<empty> ::=
You can use multiple definitions in place of the "or":

<nonempty element list> ::= <element>
<nonempty element list> ::=

<nonempty element list> , <element>
Expressions in BNF
You can use the structure of BNF to show the order of operations:
<expression> ::=
<expression> + <term>
| <expression> Ð <term>
| <term>
<term> ::=
<term> * <factor>
| <term> / <factor>
| <factor>
<factor> ::=
<primary> ** <factor>
| <primary>
<primary> ::=
<primary>
| <element>
<element> ::=
( <expression> )
| <variable>
| <number>
What is BNF notation?

BNF is an acronym for "Backus Naur Form". John Backus and Peter Naur introduced
for the first time a formal notation to describe the syntax of a given language (This
was for the description of the ALGOL 60programming language, see [Naur 60]). To be
precise, most of BNF was introduced by Backus in a report presented at an earlier
UNESCO conference on ALGOL 58. Few read the report, but when Peter Naur read it
he was surprised at some of the differences he found between his and Backus's
interpretation of ALGOL 58. He decided that for the successor to ALGOL, all
participants of the first design had come to recognize some weaknesses, should be
given in a similar form so that all participants should be aware of what they were
agreeing to. He made a few modificiations that are almost universally used and drew
up on his own the BNF for ALGOL 60 at the meeting where it was designed.
Depending on how you attribute presenting it to the world, it was either by Backus in
59 or Naur in 60. (For more details on this period of programming languages history,
see the introduction to Backus's Turing award article in Communications of the ACM,
Vol. 21, No. 8, august 1978. This note was suggested by William B. Clodius from Los
Alamos Natl. Lab).
Since then, almost every author of books on new programming languages used it to
specify the syntax rules of the language. See [Jensen 74] and [Wirth 82] for examples.
The following is taken from [Marcotty 86]:
The meta-symbols of BNF are:
::=
meaning "is defined as"
meaning "or"
<>
angle brackets used to surround category names.
The angle brackets distinguish syntax rules names (also called non-terminal symbols)
from terminal symbols which are written exactly as they are to be represented. A
BNF rule defining a nonterminal has the form:
nonterminal ::= sequence_of_alternatives consisting of strings of
terminals or nonterminals separated by the meta-symbol |
For example, the BNF production for a mini-language is:
<program> ::= program
<declaration_sequence>
begin
<statements_sequence>
end ;
This shows that a mini-language program consists of the keyword "program"
followed by the declaration sequence, then the keyword "begin" and the statements
sequence, finally the keyword "end" and a semicolon.
(end of quotation)
In fact, many authors have introduced some slight extensions of BNF for the ease of
use:
 optional items are enclosed in meta symbols [ and ], example:

 <if_statement> ::= if <boolean_expression> then
 <statement_sequence>
 [ else
 <statement_sequence> ]
 end if ;
 repetitive items (zero or more times) are enclosed in meta symbols { and },
example:
 <identifier> ::= <letter> { <letter> | <digit> }
this rule is equivalent to the recursive rule:
<identifier> ::= <letter> |

<identifier> [ <letter> | <digit> ]
 terminals of only one character are surrounded by quotes (") to distinguish
them from meta-symbols, example:
 <statement_sequence> ::= <statement> { ";" <statement> }
 in recent text books, terminal and non-terminal symbols are distingued by
using bold faces for terminals and suppressing < and > around non-terminals.
This improves greatly the readability. The example then becomes:
 if_statement ::= if boolean_expression then
 statement_sequence
 [ else
 statement_sequence ]
 end if ";"
Now as a last example (maybe not the easiest to read !), here is the definition of BNF
expressed in BNF:
syntax ::= { rule }
rule ::= identifier "::=" expression
expression ::= term { "|" term }
term ::= factor { factor }
factor ::= identifier |
quoted_symbol |
"(" expression ")" |
"[" expression "]" |
"{" expression "}"
identifier ::= letter { letter | digit }
quoted_symbol ::= """ { any_character } """
BNF is not ony important to describe syntax rules in books, but it is very commonly
used (with variants) by syntactic tools. See for example any book on LEX and YACC,
the standard UNIX parser generators. If you have access to any Unix machine, you
will probably find a chapter of the documentation on these tools.
Fundamental Concepts
There are three fundamental concepts that we will be working with in this course:
 Languages
o A language is a subset of the set of all possible strings formed from a
given set of symbols.
o There must be a membership criterion for determining whether a
particular string in the set.
 Grammars
o A grammar is a formal system for accepting or rejecting strings.
o A grammar may be used as the membership criterion for a language.
 Automata
o An automaton is a simplified, formalized model of a computer.
o An automaton may be used to compute the membership function for a
language.
o Automata can also compute other kinds of things.
Languages
Definitions 1
An alphabet is a finite, nonempty set of symbols. We use to denote this
alphabet. Note: Symbols may be more than one English letter long, e.g. while is a
single symbol in Pascal.
A string is a finite sequence of symbols from .

The length of a string s, denoted |s|, is the number of symbols in it.
The empty string is the string of length zero.

It really looks like this:
but for convenience we usually write it like this:
denotes the set of all sequences of strings that are composed of zero or
more symbols of .
denotes the set of all sequences of strings composed of one or more

symbols of . That is, = - { }.
A language is a subset of .
Languages
More Definitions
The concatenation of two strings is formed by joining the sequence of symbols in the
first string with the sequence of symbols in the second string.
If a string S can be formed by concatenating two strings A and B, S=AB, then A is

called a prefix of S and B is called a suffix of S.
The reverse of a string S, S , is obtained by reversing the sequence of symbols in the

string. For example, if S = abcd, then S = dcba.
Any string that belongs to a language is said to be a word or a sentence of that

language.
Operations on Languages
Languages are sets. Therefore, any operation that can be performed on sets can be
performed on languages.
If L, L1 and L2 are languages, then
 L1 L2 is a language.

 L1 L2 is a language.
 L1 - L2 is a language.
 -L = - L, the complement of L, is a language.
In addition,
 L1 L2, the catenation of L1 and L2, is a language.
(The strings of L1 L2 are strings that have a word of L1 as a prefix and a word
of L2 as a suffix.)
 L , the catenation of L with itself n times, is a language.
 L = L L LL LLL LLLL ..., the star closure of L, is a
language.
 L = L L LL LLL LLLL ..., the positive closure of L, is a
language.
Definition of a Grammar
A grammar G is a quadruple G = (V, T, S, P)
where
 V is a finite set of (meta)symbols, or variables.
 T is a finite set of terminal symbols.
 S V is a distinguished element of V called the start symbol.
 P is a finite set of productions (or rules).
A production has the form X Y

where
 X (V T) , and
 Y (V T) .
We'll put this in words, but -- learn the symbols. The words are just "training wheels".
 X (V T) : X is a member of the set of strings composed of any mixture of
variables and terminal symbols, but X is not the empty string.
 Y (V T) : Y is a member of the set of strings composed of any mixture of
variables and terminal symbols; Y is allowed to be the empty string.
Derivations
Productions are rules that can be used to define the strings belonging to a language.
Suppose language L is defined by a grammar G = (V, T, S, P). You can find a string
belonging to this language as follows:
1. Start with a string w consisting of only the start symbol S.

2. Find a substring x of w that matches the left-hand side of some production p in
P.
3. Replace the substring x of w with the right-hand side of production p.
4. Repeat steps 2 and 3 until the string consists entirely of symbols of T (that is, it
doesn't contain any variables).
5. The final string belongs to the language L.
Each application of step 3 above is called a derivation step.
Suppose w is a string that can be written as uxv,

where
 u and v are elements of (V T)

 x is an element of (V T)
 there is a production x y
Then we can write

uxv uyv
and we say that uxv directly derives uyv.
Notation:
 S T : S derives T (or T is derived from S) in exactly one step.

 S T : S derives T in zero or more steps.
 S T : S derives T in one or more steps.
Automata
An automaton is a simple model of a computer.
There is no formal definition for "automaton"--instead, there are various kinds of

automata, each with it's own formal definition.
Generally, an automaton
 has some form of input,

 has some form of output,
 has internal states,
 may or may not have some form of storage,
 is hard-wired rather than programmable.
An automaton that computes a Boolean (yes-no) function is called an acceptor.

Acceptors may be used as the membership criterion of a language.
An automaton that produces more general output (typically a string) is called
a transducer.
Deterministic Finite Acceptors

Deterministic Finite Acceptors
DFAs are:
 Deterministic--there is no element of choice
 Finite--only a finite number of states and arcs
 Acceptors--produce only a yes/no answer
A DFA is drawn as a graph, with each state represented by a circle.

One designated state is the start state.

Some states (possibly including the start state) can be designated as final states.

Arcs between states represent state transitions -- each such arc is labeled with the
symbol that triggers the transition.
Example DFA
Example input string: 1 0 0 1 1 1 0 0
Operation
 Start with the "current state" set to the start state and a "read head" at the
beginning of the input string;
 while there are still characters in the string:
o Read the next character and advance the read head;

o From the current state, follow the arc that is labeled with the character
just read; the state that the arc points to becomes the next current
state;
 When all characters have been read, accept the string if the current state is a
final state, otherwise reject the string.
Sample trace: q0 1 q1 0 q3 0 q1 1 q0 1 q1 1 q0 0 q2 0 q0
Since q0 is a final state, the string is accepted.
Implementing a DFA
If you don't object to the go to statement, there is an easy way to implement a DFA:
q0 : read char;
if eof then accept string;
if char = 0 then go to q2;
q1 : read char;
if eof then reject string;
q2 : read char;
q3 : read char;
Implementing a DFA, part 2

If you are not allowed to use a go to statement, you can fake it with a combination of
a loop and a case statement:
state := q0;
loop
case state of
q0 : read char;
if eof then accept string;
if char = 0 then state := q2;
q1 : read char;
q2 : read char;
q3 : read char;
end case;
end loop;
Formal Definition of a DFA

A deterministic finite acceptor or dfa is a quintuple:
M = (Q, , , q0, F)
where
 Q is a finite set of states,
 is a finite set of symbols, the input alphabet,
 : Q Q is a transition function,
 q0 Q is the initial state,
 F Q is a set of final states.
Note: The fact that is a function implies that every vertex has an outgoing arc for
each member of .
We can also define an extended transition function as
: Q Q.
If a DFA M = (Q, , , q0, F) is used as a membership criterion, then the set of

strings accepted by M is a language. That is,
L(M) = {w : (q0, w) F}.
Languages that can be defined by dfas are called regular languages.
Acceptor for Ada identifiers

In Ada, an identifier consists of a letter followed by any number of letters, digits, and
underlines. However, the identifier may not end in an underline or have two
underlines in a row.
Here is an automaton to recognize Ada identifiers.

M = (Q, , , q0, F), where
 Q is {q0, q1, q2, q3},

 is {letter, digit, underline},
 is given by
 (q0, letter) = q1 (q1, letter) = q1
 (q0, digit) = q3 (q1, digit) = q1
 (q0, underline) = q3 (q1, underline) = q2

 (q2, letter) = q1 (q3, letter) = q3
 (q2, digit) = q1 (q3, digit) = q3
 (q2, underline) = q3 (q3, underline) = q3
 {q1} Q is a set of final states.
Abbreviated Acceptor for Ada Identifiers

The following is an abbreviated automaton (my terminology) to recognize Ada
identifiers. You might use something like this in a course on compiler construction.
The difference is that, in this automaton, does not appear to be a function. It looks
like a partial function, that is, it is not defined for all values of Q .
We can complete the definition of by assuming the existence of an "invisible" state
and some "invisible" arcs. Specifically,
 There is exactly one implicit error state;

 If there is no path shown from a state for a given symbol in , there is an
implicit path for that symbol to the error state;
 The error state is a trap state: once you get into it, all arcs (one for each symbol
in ) lead back to it; and
 The error state is not a final state.
The automaton represented above is really exactly the same as the automaton on the
previous page; we just haven't bothered to draw one state and a whole bunch of arcs
that we know must be there.
I don't think you'll find abbreviated automata in the textbook. They aren't usually
allowed in a formal course. However, if you ever use an automaton to design a lexical
scanner, putting in an explicit error state just clutters up the diagram.
Nondeterministic Finite Acceptors

A finite-state automaton can be nondeterministic in either or both of two ways:
A state may have two or more arcs emanating from

it labeled with the same symbol. When the symbol
occurs in the
input, either arc
may be followed.
A state may have
one or more arcs emanating from it labeled with
(the empty string) . These arcs may optionally be followed without looking at the
input or consuming an input symbol.
Due to nondeterminism, the same string may cause an nfa to end up in one of several
different states, some of which may be final while others are not. The string is
accepted if any possible ending state is a final state.
Example NFAs
Implementing an NFA
If you think of an automaton as a computer, how does it handle nondeterminism?
There are two ways that this could, in theory, be done:
1. When the automaton is faced with a choice, it always (magically) chooses

correctly. We sometimes think of of the automaton as consulting
an oracle which advises it as to the correct choice.
2. When the automaton is faced with a choice, it spawns a new process, so that
all possible paths are followed simultaneously.
The first of these alternatives, using an oracle, is sometimes attractive

mathematically. But if we want to write a program to implement an nfa, that isn't
feasible.
There are three ways, two feasible and one not yet feasible, to simulate the second
alternative:
1. Use a recursive backtracking algorithm. Whenever the automaton has to make

a choice, cycle through all the alternatives and make a recursive call to
determine whether any of the alternatives leads to a solution (final state).
2. Maintain a state set or a state vector, keeping track of all the states that the
nfa could be in at any given point in the string.
3. Use a quantum computer. Quantum computers explore literally all possibilities
simultaneously. They are theoretically possible, but are at the cutting edge of
physics. It may (or may not) be feasible to build such a device.
4. Recursive Implementation of NFAs

5. An nfa can be implemented by means of a recursive search from the start state
for a path (directed by the symbols of the input string) to a final state.
6. Here is a rough outline of such an implementation:
7. function nfa (state A) returns Boolean:
8. local state B, symbol x;
9. for each transition from state A to some state B do
10. if nfa (B) then return True;
11. if there is a next symbol then
12. { read next symbol (x);
13. for each x transition from state A to
14. some state B do
15. if nfa (B) then
16. return True;
17. return False;
18. }
19. else
20. { if A is a final state then return True;
21. else return False;
22. }
23. One problem with this implementation is that it could get into an infinite
loop if there is a cycle of transitions. This could be prevented by maintaining
a simple counter (How?).
State-Set Implementation of NFAs

Another way to implement an NFA is to keep either a state set or a bit vector of all the
states that the NFA could be in at any given time. Implementation is easier if you use
a bit-vector approach (v[i] is True iff state i is a possible state), since most languages
provide vectors, but not sets, as a built-in datatype. However, it's a bit easier to
describe the algorithm if you use a state-set approach, so that's what we will do. The
logic is the same in either case.
function nfa (state set A) returns Boolean:
local state set B, state a, state b, state c, symbol x;
for each a in A do
for each transition from a
to some state b do
add b to B;
while there is a next symbol do
{ read next symbol (x);
B := ;
for each a in A do
{ for each transition from a to some state b do
add b to B;
for each x transition from a to some state b do
add b to B;
}
for each transition from
some state b in B to some state c not in B do
add c to B;
A := B;
}
if any element of A is a final state then
return True;
else
return False;
Formal Definition of NFAs

The extension of our notation to NFAs is somewhat strained.
A nondeterministic finite acceptor or nfa is defined by the quintuple
M = (Q, , , q0, F)
where
 : Q ( { } ) 2 is a transition function,
These are all the same as for a dfa except for the definition of :
 Transitions on are allowed in addition to transitions on elements of , and
 The range of is 2 rather than Q. This means that the values of are not
elements of Q, but rather are sets of elements of Q.
The language defined by nfa M is defined as

L(M) = {w : (q0, w) F }
DFA = NFA
Two acceptors are equivalent if the accept the same language.
A DFA is just a special case of an NFA that happens not to have any null transitions
or multiple transitions on the same symbol. So DFAs are not more powerful than
NFAs.
For any NFA, we can construct an equivalent DFA (see below). So NFAs are not
more powerful than DFAs. DFAs and NFAs define the same class of languages --
the regular languages.
To translate an NFA into a DFA, the trick is to label each state in the DFA with a set
of states from the NFA. Each state in the DFA summarizes all the states that the NFA
might be in. If the NFA contains |Q| states, the resultant DFA could contain as many
as |2 | states. (Usually far fewer states will be needed.)
Primitive Regular Expressions
A regular expression can be used to define a language. A regular expression
represents a "pattern;" strings that match the pattern are in the language, strings that
do not match the pattern are not in the language.
As usual, the strings are over some alphabet .
The following are primitive regular expressions:
 x, for each x ,
 , the empty string, and
 , indicating no strings at all.
Thus, if | | = n, then there are n+2 primitive regular expressions defined over .
Here are the languages defined by the primitive regular expressions:
 For each x , the primitive regular expression x denotes the language {x}.
That is, the only string in the language is the string "x".
 The primitive regular expression denotes the language { }. The only string
in this language is the empty string.
 The primitive regular expression denotes the language {}. There
are no strings in this language.
Regular Expressions
Every primitive regular expression is a regular expression.
We can compose additional regular expressions by applying the following

rules a finite number of times:
 If r1 is a regular expression, then so is (r1).

 If r1 is a regular expression, then so is r1*.
 If If r1 and r2 are regular expressions, then so is r1r2.
 If If r1 and r2 are regular expressions, then so is r1+r2.
Here's what the above notation means:

 Parentheses are just used for grouping.
 The postfix star indicates zero or more repetitions of the preceding regular
expression. Thus, if x , then the regular expression x* denotes the language
{ , x, xx, xxx, ...}.
 Juxtaposition of r1 and r2 indicates any string described by r1 immediately
followed by any string described by r2. For example, if x, y , then the
regular expression xy describes the language {xy}.
 The plus sign, read as "or," denotes the language containing strings described
by either of the component regular expressions. For example, if x, y , then
the regular expression x+y describes the language {x, y}.
Precedence: * binds most tightly, then justaposition, then +. For example,

a+bc* denotes the language {a, b, bc, bcc, bccc, bcccc, ...}.
Languages Defined by Regular Expressions

There is a simple correspondence between regular expressions and the languages they
denote:
Regular expression L(regular expression)
x, for each x {x}
{ }
{}
(r1) L(r1)
r1* (L(r1))*
r1 r2 L(r1) L(r2)
r1 + r2 L(r1) L(r2)
Building Regular Expressions

Here are some hints on building regular expressions. We will assume = {a, b, c}.
Zero or more.
a* means "zero or more a's." To say "zero or more ab's," that is, { , ab, abab,
ababab, ...}, you need to say (ab)*. Don't say ab*, because that denotes the
language {a, ab, abb, abbb, abbbb, ...}.
One or more.
Since a* means "zero or more a's", you can use aa* (or equivalently, a*a) to
mean "one or more a's." Similarly, to describe "one or more ab's," that is, {ab,
abab, ababab, ...}, you can use ab(ab)*.
Zero or one.
You can describe an optional a with (a+ ).
Any string at all.
To describe any string at all (with = {a, b, c}), you can use (a+b+c)*.
Any nonempty string.
This can be written as any character from followed by any string at all:
(a+b+c)(a+b+c)*.
Any string not containing....
To describe any string at all that doesn't contain an a (with = {a, b, c}), you
can use (b+c)*.
Any string containing exactly one...
To describe any string that contains exactly one a, put "any string not
containing an a," on either side of the a, like this: (b+c)*a(b+c)*.
Example Regular Expressions

These are from exercise 14 on page 78 of your textbook.
Give regular expressions for the following languages on = {a, b, c}.
All strings containing exactly one a.
(b+c)*a(b+c)*
All strings containing no more than three a's.
We can describe the string containing zero, one, two, or three a's (and nothing
else) as
( +a)( +a)( +a)
Now we want to allow arbitrary strings not containing a's at the places marked
by X's:
X( +a)X( +a)X( +a)X
so we put in (b+c)* for each X:

(b+c)*( +a)(b+c)*( +a)(b+c)*( +a)(b+c)*
All strings which contain at least one occurrence of each symbol in .
The problem here is that we cannot assume the symbols are in any particular
order. We have no way of saying "in any order", so we have to list the possible
orders:
abc+acb+bac+bca+cab+cba
To make it easier to see what's happening, let's put an X in every place we want
to allow an arbitrary string:
XaXbXcX + XaXcXbX + XbXaXcX + XbXcXaX + XcXaXbX + XcXbXaX
Finally, replacing the X's with (a+b+c)* gives the final (unwieldy) answer:
(a+b+c)*a(a+b+c)*b(a+b+c)*c(a+b+c)* +
(a+b+c)*a(a+b+c)*c(a+b+c)*b(a+b+c)* +
(a+b+c)*b(a+b+c)*a(a+b+c)*c(a+b+c)* +
(a+b+c)*b(a+b+c)*c(a+b+c)*a(a+b+c)* +
(a+b+c)*c(a+b+c)*a(a+b+c)*b(a+b+c)* +
(a+b+c)*c(a+b+c)*b(a+b+c)*a(a+b+c)*
All strings which contain no runs of a's of length greater than two.
We can fairly easily build an expression containing no a, one a, or one aa:
(b+c)*( +a+aa)(b+c)*
but if we want to repeat this, we need to be sure to have at least one non-a
between repetitions:
(b+c)*( +a+aa)(b+c)*((b+c)(b+c)*( +a+aa)(b+c)*)*
All strings in which all runs of a's have lengths that are multiples of three.
(aaa+b+c)*
Regular Expressions Denote Regular

Languages
Regular Expressions and Automata
Languages described by deterministic finite acceptors (dfas) are called regular
languages.
For any nondeterministic finite acceptor (nfa) we can find an equivalent dfa. Thus
nfas also describe regular languages.
Regular expressions also describe regular languages. We will show that regular
expressions are equivalent to nfas by doing two things:
1. For any given regular expression, we will show how to build an nfa that accepts
the same language. (This is the easy part.)
2. For any given nfa, we will show how to construct a regular expression that
describes the same language. (This is the hard part.)
3. From Primitive Regular Expressions to

NFAs
4. Every nfa we construct will have a single start state and a single final state. We
will build more complex nfas out of simpler nfas, each with a single start state
and a single final state. The simplest nfas will be those for the primitive regular
expressions.
5. For any x in , the regular expression x denotes the language
{x}. This nfa represents exactly that language.
6. Note that if this were a dfa, we would have to include arcs for
all the other elements of .
7.
The regular expression denotes the language { }, that is,
the language containing only the empty string.
8.
The regular expression denotes the language ;
no strings belong to this language, not even the empty string.
9. Since the final state is unreachable, why bother to have it at
all? The answer is that it simplifies the construction if every
nfa has exactly one start state and one final state. We could do without this final
state, but we would have more special cases to consider, and it doesn't hurt
anything to include it.
From Regular Expressions to NFAs

We will build more complex nfas out of simpler nfas, each with a single start state and
a single final state. Since we have nfas for primitive regular expressions, we need to
compose them for the operations of grouping, juxtaposition, union, and Kleene star
(*).
For grouping (parentheses), we don't really need to do anything. The nfa that
represents the regular expression (r1) is the same as the nfa that represents r1.
For juxtaposition (strings in

L(r1) followed by strings in
L(r2), we simply chain the
nfas together, as shown. The
initial and final states of the
original nfas (boxed) stop being initial and final states; we include new initial and
final states. (We could make do with fewer states and fewer transitions here, but we
aren't trying for the best construction; we're just trying to show that a construction is
possible.)
The + denotes "or" in a regular expression, so

it makes sense that we would use an nfa with a
choice of paths. (This is one of the reasons that
it's easier to build an nfa than a dfa.)
The star
denotes zero or more applications of the regular
expression, so we need to set up a loop
in the nfa. We can do this with a backward-
pointing arc. Since we might want to
traverse the regular expression zero times (thus matching the null string), we also need
a forward-pointing arc to bypass the nfa entirely.
From NFAs to Regular Expressions (Part I)

Creating a regular expression to recognize the same strings as an nfa is trickier than
you might expect, because the nfa may have arbitrary loops and cycles. Here's the
basic approach (details supplied later):
1. If the nfa has more than one final state, convert it to an nfa with only one final
state. Make the original final states nonfinal, and add a transition from each
to the new (single) final state.
2. Consider the nfa to be a generalized transition graph, which is just like an nfa
except that the edges may be labeled with arbitrary regular expressions. Since
the labels on the edges of an nfa may be either or members of , each of
these can be considered to be a regular expression.
3. Remove states one by one from the nfa, relabeling edges as you go, until only
the initial and the final state remain.
4. Read the final regular expression from the two-state automaton that results.
The regular expression derived in the final step accepts the same language as the
original nfa.
Since we can convert an nfa to a regular expression, and we can convert a regular
expression to an nfa, the two are equivalent formalisms--that is, they both describe the
same class of languages, the regular languages.
From NFAs to Regular Expressions (Part II)

There are two complicated parts to extracting a regular expression from an NFA:
removing states, and reading the regular expression off the resultant two-state
generalized transition graph.
Here's how to delete a state (this is taken with minor modifications from Figure 3.9 on
page 85 of your textbook):
To delete state Q, where Q is neither the initial state nor the final state,
replace with .
You should convince yourself that this transformation is "correct", in the sense that
paths which leave you in Qi in the original will leave you in Qi in the replacement, and
similarly for Qj.
 What if state Q has connections to more than two other states, say, Q i, Qj, and
Qk? Then you have to consider these states pairwise: Q i with Qj, Qj with Qk, and
Qi with Qk.
 What if some of the arcs in the original state are missing? There are too many
cases to work this out in detail, but you should be able to figure it out for any
specific case, using the above as a model.
You will end up with an nfa that looks like this, where r 1, r2, r3,
and r4 are (probably very complex) regular expressions. The
resultant nfa represents the regular expression
r1*r2(r4 + r3r1*r2)*
(you should verify that this is indeed the correct regular
expression). All you have to do is plug in the correct values for r 1, r2, r3, and r4.
Three Ways of Defining a Language

This page presents an example solved three different ways. No new information is
presented.
Problem: Define a language containing all strings over = {a, b, c} where no symbol

ever follows itself; that is, no string contains any of the substrings aa, bb, or cc.
Definition by grammar
Define the grammar G = (V, T, S, P) where
 V = {S, ...some other variables...}.

 T = = {a, b, c}.
 The start symbol is S.
 P is given below.
These should be pretty obvious except for the set V, which we generally make up as
we construct P.
Since the empty string belongs to the language, we need the production
S
Some strings belonging to the language begin with the symbol a. The a can be
followed by any other string in the language, so long as this other string does not
begin with a. So we make up a variable, call it NOTA, to produce these other strings,
and add the production
S a NOTA
By similar logic, we add the variables NOTB and NOTC and the productions

S b NOTB
S c NOTc
Now, NOTA is either the empty string, or some string that begins with b, or some string
that begins with c. If it begins with b, then it must be followed by a (possibly empty)
string that does not begin with b--and we already have a variable for that case, NOTB.
Similarly, if NOTA is some string beginning with c, the c must be followed by NOTC.
This gives the productions
NOTA
NOTA b NOTB
NOTA c NOTC
Similar logic gives the following productions for NOTB and NOTC:
NOTB
NOTB a NOTA
NOTB c NOTC
NOTC
NOTC a NOTA
NOTC b NOTB
We add NOTA, NOTB, and NOTC to set V, and we're done.
Example derivation:
S a NOTA a b NOTB a b a NOTA a b a c NOTC a b a c.
Definition by nfa
Defining the language by an nfa follows almost exactly the same logic as defining the
language by a grammar. Whenever an input symbol is read, go to a state that will
accept any symbol other than the one read. To emphasize the similarity with the
preceding grammar, we will name our states to correspond to variables in the
grammar.
Definition by regular expression

As usual, it is more difficult to find a suitable regular expression to define this
language, and the regular expression we do find bears little resemblance to the
grammar or to the nfa.
The key insight is that strings of the language can be viewed as consisting of zero or
more repetitions of the symbol a, and between them must be strings of the
form bcbcbc... or cbcbcb.... So we can start with
X a Y a Y a Y a ... Y a Z
where we have to find suitable expressions for X, Y, and Z. But first, let's get the
above expression in a proper form, by getting rid of the "...". This gives
X a (Y a)* Z
and, since we might not have any as at all,

(X a (Y a)* Z) + X
Now X can be empty, a single b, a single c, or can consist of an alternating sequence

of bs and cs. This gives
X = ( + b + c + (bc)* + (cb)*)
This isn't quite right, because it doesn't allow (bc)*b or (cb)*c. When we include

these, we get
X = ( + b + c + (bc)* + (cb)* + (bc)*b + (cb)*c)
This is now correct, but could be simplified. The last four terms include the
+b+c cases, so we can drop those three terms. Then we can combine the last four
terms into
X = (bc)*(b + ) + (cb)*(c + )
Now, what about Z? As it happens, there isn't any difference between what we need
for Z and what we need for X, so we can also use the above expression for Z.
Finally, what about Y? This is just like the others, except that Y cannot be empty.
Luckily, it's easy to adjust the above expression for X and Z so that it can't be empty:
Y = ((bc)*b + (cb)*c)
Substituting into (X a (Y a)* Z) + X, we get
((bc)*(b + ) + (cb)*(c + ) a (((bc)*b + (cb)*c) a)* (bc)*(b + ) +

(cb)*(c + )) + (bc)*(b + ) + (cb)*(c + )
Regular Grammars
Grammars for Regular Languages

We already know:
 A language defined by a dfa is a regular language.
 Any dfa can be regarded as a special case of an nfa.
 Any nfa can be converted to an equivalent dfa; thus, a language defined by an

nfa is a regular language.
 A regular expression can be converted to an equivalent nfa; thus, a language
defined by a regular expression is a regular language.
 An nfa can (with some effort!) be converted to a regular expression.
So dfas, nfas, and regular expressions are all "equivalent," in the sense that any
language you define with one of these could be defined by the others as well.
We also know that languages can be defined by grammars. Now we will begin to
classify grammars; and the first kinds of grammars we will look at are the regular
grammars. As you might expect, regular grammars will turn out to be equivalent to
dfas, nfas, and regular expressions.
Classifying Grammars
Recall that a grammar G is a quadruple G = (V, T, S, P)
where
 P is a finite set of productions.
The above is true for all grammars. We will distinguish among different kinds of

grammars based on the form of the productions. If the productions of a grammar all
follow a certain pattern, we have one kind of grammar. If the productions all fit a
different pattern, we have a different kind of grammar.
Productions have the form:
(V T) (V T) .
Different types of grammars can be defined by putting additional restrictions on the

left-hand side of productions, the right-hand side of productions, or both.
Right-Linear Grammars
In general, productions have the form:
(V T) (V T) .
In a right-linear grammar, all productions have one of the two forms:
V T*V
or
V T*
That is, the left-hand side must consist of a single variable, and the right-hand side
consists of any number of terminals (members of ) optionally followed by a single
variable. (The "right" in "right-linear grammar" refers to the fact that, following the
arrow, a variable can occur only as the rightmost symbol of the production.)
Right-Linear Grammars and NFAs

There is a simple connection between right-linear grammars and NFAs, as suggested
by the following diagrams:
A x B
A x y z B
A B
A x
As an example of the correspondence between an nfa and a right-linear grammar, the following
automaton and grammar both recognize the set of strings consisting of an even number of 0's and an
even number of 1's.
S
S 0 B
S 1 A
A 0 C
A 1 S
B 0 S
B 1 C
C 0 A
C 1 B
Left-Linear Grammars
In a left-linear grammar, all productions have one of the two forms:
V VT*
or
V T*
That is, the left-hand side must consist of a single variable, and the right-hand side
consists of an optional single variable followed by any number of terminals. This is
just like a right-linear grammar except that, following the arrow, a variable can occur
only on the left of the terminals, rather than only on the right.
We won't pay much attention to left-linear grammars, because they turn out to be
equivalent to right-linear grammars. Given a left-linear grammar for language L, we
can construct a right-linear grammar for the same language, as follows:
Step Method
Construct a right-linear
grammar for the (different) Replace each production A x of L with a production A x , and
language L . replace each production A B x with a production A x B.
Construct an nfa for L from We talked about deriving an nfa from a right-linear grammar on an
the right-linear grammar. This earlier page. If the nfa has more than one final state, we can make those
nfa should have just one final states nonfinal, add a new final state, and put transitions from each
state. previously final state to the new final state.
1. Construct an nfa to recognize language L.

Reverse the nfa for L to 2. Ensure the nfa has only a single final state.
obtain an nfa for L. 3. Reverse the direction of the arcs.
4. Make the initial state final and the final state initial.
Construct a right-linear
grammar for L from the nfa for This is the technique we just talked about on an earlier page.
L.
Regular Grammars
A regular grammar is either a right-linear grammar or a left-linear grammar.
To be a right-linear grammar, every production of the grammar must have one of the

two forms V T*V or V T*.
To be a left-linear grammar, every production of the grammar must have one of the

two forms V VT* or V T*.
You do not get to mix the two. For example, consider a grammar with the following
productions:
S
S a X
X S b
This grammar is neither right-linear nor left-linear, hence it is not a regular grammar.
We have no reason to suppose that the language it generates is a regular language (one
that is generated by a dfa).
In fact, the grammar generates a language whose strings are of the form a b . This
language cannot be recognized by a dfa. (Why not?)
Properties of Regular Languages
Closure I
A set is closed under an operation if, whenever the operation is applied to members of
the set, the result is also a member of the set.
For example, the set of integers is closed under addition, because x+y is an integer
whenever x and y are integers. However, integers are not closed under division: if x
and y are integers, x/y may or may not be an integer.
We have defined several operations on languages:
L1 L2 Strings in either L1 or L2

L1 L2 Strings in both L1 and L2
L1L2 Strings composed of one string from L1 followed by one string from L2
-L1 All strings (over the same alphabet) not in L1
L1* Zero or more strings from L1 concatenated together
L1 - L2 Strings in L1 that are not in L2
L1 Strings in L1, reversed
We will show that the set of regular languages is closed under each of these
operations. We will also define the operations of "homomorphism" and "right
quotient" and show that the set of regular languages is also closed under these
operations.
Closure II: Union, Concatenation, Negation,

Kleene Star, Reverse
General Approach
 Build automata (dfas or nfas) for each of the languages involved.
 Show how to combine the automata to create a new automaton that
recognizes the desired language.
 Since the language is represented by an nfa or dfa, conclude that the language
is regular.
Union of L1 and L2
 Create a new start state.
 Make a transition from the new start state to each of the original start
states.
Concatenation of L1 and L2
 Put a transition from each final state of L1 to the initial state of L 2
 Make the original final states of L1 nonfinal
Negation of L1
 Start with a (complete) dfa, not with an nfa.

 Make every final state nonfinal and every nonfinal state final.
Kleene Star of L1
 Make a new start state; connect it to the original start state with a transition.
 Make a new final state; connect the original final states (which become nonfinal) to it with transitions.
 Connect the new start state and new final state with a pair of transitions.
Reverse of L1
 Start with an automaton with just one final state.

 Make the initial state final and the final state initial.
 Reverse the direction of every arc.
Closure III: Intersection and Set Difference

Just as with the other operations, you prove that regular languages are closed under
intersection and set difference by starting with automata for the initial languages, and
constructing a new automaton that represents the operation applied to the initial
languages. However, the constructions are somewhat trickier.
In these constructions you form a completely new machine, whose states are each
labeled with an ordered pair of state names: the first element of each pair is a state
from L1, and the second element of each pair is a state from L 2. (Usually you won't
need a state for every such pair, just some of them.)
1. Begin by creating a start state whose label is (start state of L 1, start state of L2).
2. Repeat the following until no new arcs can be added:
1. Find a state (A, B) that lacks a transition for some x in .
2. Add a transition on x from state (A, B) to state ( (A, x), (B, x)). (If this
state doesn't already exist, create it.)
The same construction is used for both intersection and set difference. The distinction
is in how the final states are selected.
Intersection: Mark a state (A, B) as final if both (i) A is a final state in L1, and (ii) B
is a final state in L2.
Set difference: Mark a state (A, B) as final if A is a final state in L 1, but B is not a
final state in L2.
Closure IV: Homomorphism

Note: "Homomorphism" is a term borrowed from group theory. What we refer to as a
"homomorphism" is really a special case.
Suppose and are alphabets (not necessarily distinct). Then a homomorphism h is

a function from to *.
If w is a string in , then we define h(w) to be the string obtained by replacing each
symbol x by the corresponding string h(x) *.
If L is a language on , then its homomorphic image is a language on . Formally,
h(L) = {h(w): w L}
Theorem. If L is a regular language on , then its homomorphic image h(L) is a

regular language on . That is, if you replaced every string w in L with h(w), the
resultant set of strings would be a regular language on .
Proof.
 Construct a dfa representing L. This is possible because L is regular.

 For each arc in the dfa, replace its label x with h(x) .
 If an arc is labeled with a string w of length greater than one, replace the arc
with a series of arcs and (new) states, so that each arc is labeled with a single
element of . The result is an nfa that recognizes exactly the language h(L).
 Since the language h(L) can be specified by an nfa, the language is regular.
Q.E.D.
Closure V: Right Quotient

Let L1 and L2 be languages on the same alphabet. The right quotient of L1 with L2 is
L1/L2 = {w: wx L1 and x L2}
That is, the strings in L1/L2 are strings from L1 "with their tails cut off." If some string
of L1 can be broken into two parts, w and x, where x is in language L 2, then w is in
language L1/L2.
Theorem. If L1 and L2 are both regular languages, then L1/L2 is a regular language.
Proof: Again, the proof is by construction. We start with a dfa M(L 1) for L1; the dfa
we construct is exactly like the dfa for L 1, except that (in general) different states will
be marked as final.
For each state Qi in M(L1), determine if it should be final in M(L 1/L2) as follows:
 Starting in state Qi as if it were the initial state, determine if any of the strings
in language L2 are accepted by M(L1). If there are any, then state Qi should be
marked as final in M(L1/L2). (Why?)
That's the basic algorithm. However, one of the steps in it is problematical: since
language L2 may have an infinite number of strings, how do we determine whether
some unknown string in the language is accepted by M(L 1) when starting at Qi?
We cannot try all the strings, because we insist on a finite algorithm.
The trick is to construct a new dfa that recognizes the intersection of two languages:
(1) L2, and (2) the language that would be accepted by dfa M(L 1) if Qi were its initial
state. We already know we can build this machine. Now, if this machine
recognizes any string whatever (we can check this easily), then the two machines have
a nonempty intersection, and Qi should be a final state. (Why?)
We have to go through this same process for every state Q i in M(L1), so the algorithm
is too lengthy to step through by hand. However, it is enough for our purposes that the
algorithm exists.
Finally, since we can construct a dfa that recognizes L 1/L2, this language is therefore
regular, and we have shown that the regular languages are closed under right quotient.
Standard Representations
A regular language is given in a standard representation if it is specified by one of:
 A finite automaton (dfa or nfa).
 A regular expression.
 A regular grammar.
(The importance of these particular representations is simply that they are precise and
unambiguous; thus, we can prove things about languages when they are expressed in a
standard representation.)
Membership. If L is a language on alphabet , L is in a standard representation, and

w *, then there is an algorithm for determining whether w L.
Proof. Build the automation and use it to test w.
Finiteness. If language L is specified by a standard representation, there is an

algorithm to determine whether the set L is empty, finite, or infinite.
Proof. Build the automaton.
 If there is no path from the initial state to a final state, then the language is
empty (and finite).
 If there is a path containing a cycle from the initial state to some final state,
then the language is infinite.
 If no path from the initial state to a final state contains a cycle, then the
language is finite.
Equivalence. If languages L1 and L2 are each given in a standard representation, then

there is an algorithm to determine whether the languages are identical.
Proof. Construct the language
(L1 -L2) (-L1 L2)

If this language is empty, then L1 = L2. (Why?)
The Pumping Lemma

The Pigeonhole Principle
pigeonhole: (pij' un hole) n
1. a hole or small recess for pigeons to nest
2. a small open compartment (as in a desk or cabinet) for keeping letters or
documents
3. a neat category which usu. fails to reflect actual complexities.
pigeonhole principle: if n objects are put into m containers, where n > m, then at least

one container must hold more than one object.
The pigeonhole can be used to prove that certain infinite languages are not regular.
(Remember, any finite language is regular.)
As we have informally observed, dfas "can't count." This can be shown formally by
using the pigeonhole principle. As an example, we show that L = {a b : n > 0} is not
regular. The proof is by contradiction.
Suppose L is regular. There are an infinite number of values of n but M(L) has only a
finite number of states. By the pigeonhole principle, there must be distinct values
of i and j such that ai and aj end in the same state. From this state,
 bi must end in a final state, because a ibi is in L; and

 bi must end in a nonfinal state, because ajbi is not in L.
Since the state reached cannot be both final and nonfinal, we have a
contradiction. Thus our assumption, that L is regular, must be incorrect. Q.E.D.
The Pumping Lemma

Here's what the pumping lemma says:
 If an infinite language is regular, it can be defined by a dfa.
 The dfa has some finite number of states (say, n).
 Since the language is infinite, some strings of the language must have length
> n.
 For a string of length > n accepted by the dfa, the walk through the dfa must
contain a cycle.
 Repeating the cycle an arbitrary number of times must yield another string
accepted by the dfa.
The pumping lemma for regular languages is another way of proving that a given

(infinite) language is not regular. (The pumping lemma cannot be used to prove that a
given language is regular.)
The proof is always by contradiction. A brief outline of the technique is as follows:
 Assume the language L is regular.

 By the pigeonhole principle, any sufficiently long string in L must repeat some
state in the dfa; thus, the walk contains a cycle.
 Show that repeating the cycle some number of times ("pumping" the cycle)
yields a string that is not in L.
 Conclude that L is not regular.
Why this is hard:

 We don't know the dfa (if we did, the language would be regular!). Thus, we
have do the proof for an arbitrary dfa that accepts L.
 Since we don't know the dfa, we certainly don't know the cycle.
Why we can sometimes pull it off:

 We get to choose the string (but it must be in L).
 We get to choose the the number of times to "pump."
Applying the Pumping Lemma

Here's a more formal definition of the pumping lemma:
If L is an infinite regular language, then there exists some positive integer m such that
any string w L whose length is m or greater can be decomposed into three parts,
xyz, where
 |xy| is less than or equal to m,

 |y| > 0,
 wi = xyiz is also in L for all i = 0, 1, 2, 3, ....
Here's what it all means:

 m is a (finite) number chosen so that strings of length m or greater must contain
a cycle. Hence, m must be equal to or greater than the number of states in the
dfa. Remember that we don't know the dfa, so we can't actually choose m; we
just know that such an m must exist.
 Since string w has length greater than or equal to m, we can break it into two
parts, xy and z, such that xy must contain a cycle. We don't know the dfa, so we
don't know exactly where to make this break, but we know that |xy| can be less
than or equal to m.
 We let x be the part before the cycle, y be the cycle, and z the part after the
cycle. (It is possible that x and z contain cycles, but we don't care about that.)
Again, we don't know exactly where to make this break.
 Since y is the cycle we are interested in, we must have |y| > 0, otherwise it isn't
a cycle.
 By repeating y an arbitrary number of times, xy*z, we must get other strings in
L.
 If, despite all the above uncertainties, we can show that the dfa has to accept
some string that we know is not in the language, then we can conclude that the
language is not regular.
To use this lemma, we need to show:

1. For any choice of m,
2. for some w L that we get to choose (and we will choose one of length at least
m),
3. for any way of decomposing w into xyz, so long as |xy| isn't greater than m and
y isn't ,
4. we can choose an i such that xyiz is not in L.
We can view this as a game wherein our opponent makes moves 1 and 3
(choosing m and choosing xyz) and we make moves 2 and 4 (choosing w and
choosing i). Our goal is to show that we can always beat our opponent. If we
can show this, we have proved that L is not regular.
Pumping Lemma Example 1
Prove that L = {anbn: n 0} is not regular.
1. We don't know m, but assume there is one.
2. Choose a string w = anbn where n > m, so that any prefix of length m consists
entirely of a's.
3. We don't know the decomposition of w into xyz, but since |xy| m, xy must
consist entirely of a's. Moreover, y cannot be empty.
4. Choose i = 0. This has the effect of dropping |y| a's out of the string, without
affecting the number of b's. The resultant string has fewer a's than b's, hence
does not belong to L. Therefore L is not regular.
Q.E.D.

Prove that L = {anbk: n > k and n 0} is not regular.
2. Choose a string w = anbk where n > m, so that any prefix of length m consists
entirely of a's, and k = n-1, so that there is just one more a than b.
3. We don't know the decomposition of w into xyz, but since |xy| m, xy must
consist entirely of a's. Moreover, y cannot be empty.
4. Choose i = 0. This has the effect of dropping |y| a's out of the string, without
affecting the number of b's. The resultant string has fewer a's than before, so it
has either fewer a's than b's, or the same number of each. Either way, the string
does not belong to L, so L is not regular.
Q.E.D.

Prove that L = {an: n is a prime number} is not regular.
2. Choose a string w = an where n is a prime number and |xyz| = n > m+1. (This
can always be done because there is no largest prime number.) Any prefix of w
consists entirely of a's.
3. We don't know the decomposition of w into xyz, but since |xy| m, it follows
that |z| > 1. As usual, |y| > 0,
4. Since |z| > 1, |xz| > 1. Choose i = |xz|. Then |xy iz| = |xz| + |y||xz| = (1 + |y|)|xz|.
Since (1 + |y|) and |xz| are each greater than 1, the product must be a composite
number. Thus |xyiz| is a composite number.
Q.E.D.
Context-Free Grammars
Definition of CFGs
A grammar G = (V, T, S, P) is a context free grammar (cfg) if all
productions in P have the form
A x
where
 A V, and
 x (V T)*.
Recall that the general form of a production is

A B
where
 A (V T) , and
 x (V T)*.
Since V (V T) , the productions for a context-free grammar are a restricted
form of the productions allowed for a general grammar. Thus, a context-free grammar
is a grammar.
Regular Grammars Are Context Free

Recall that productions of a right-linear grammar must have one of the two forms
A x
or
A xB
where
 A, B V, and
 x T*.
Since T* (V T)* and T*V (V T)*, it follows that every right-linear
grammar is also a context-free grammar.
Similarly, right-linear grammars and linear grammars are also context-free grammars.
A context-free language (cfl) is a language that can be defined by a context-
free grammar.
Notes on Terminology
Every regular grammar is a context-free grammar, in the same way that every dog is
an animal.
In normal speech we try to be as specific as possible. If we know that, say, Fido is a

dog, we generally refer to Fido as a dog. We don't refer to Fido as an animal (unless
we are trying to be deliberately vague). But if asked whether Fido is an animal, the
correct answer is certainly "yes."
In the same way, if language L is a regular language, we generally refer to L as a

regular language. We don't refer to L as a context-free language unless we are being
deliberately vague. But if asked whether L is a context-free language, the correct
answer is "yes."
The usual convention of being as specific as possible sometimes leads to confusion. If

I say language L is a context-free language, I probably mean either (a) L
is not regular, or (b) I don't know whether L is regular. If I do know that L is a regular
language, I should call it a regular language, not a context-free language.
Languages and Grammars

A regular language is a language that can be defined by a regular grammar.
A context-free language is a language that can be defined by a context-free grammar.
If grammar G is context free but not regular, we know the language L(G) is context
free. We do not know that L(G) is not regular. It might be possible to find a regular
grammar G2 that also defines L.
Example
Consider the following grammar:
G = ({S, A, B}, {a, b}, S, {S AB, A aA, A ,B Bb, B })
Is G a context-free grammar?
Yes.
Is G a regular grammar?
No.
Is L(G) a context-free language?
Yes.
Is L(G) a regular language?

Yes -- the language L(G) is regular because it can be defined by the regular grammar:
G = ({S, A, B}, {a, b}, S, {S A, A aA, A B, B bB, B })
Example CFGs
Example 1
We have shown that L = {anbn: n 0} is not regular. Here is a context-free grammar
for this language.
G = ({S}, {a, b}, S, {S aSb, S }
Example 2
We have shown that L = {anbk: k > n 0} is not regular. Here is a context-free
grammar for this language.
G = ({S, B}, {a, b}, S, {S aSb, S B, B bB, B b}).
Example 3
The language L = {wwR: w {a, b}*}, where each string in L is a palindrome, is not
regular. Here is a context-free grammar for this language.
G = ({S}, {a, b}, S, {S aSa, S bSb, S }).
More Example CFGs

Example 4
The language L = {w: w {a, b}*, na(w) = nb(w)}, where each string in L has an equal
number of a's and b's, is not regular. Consider the following grammar:
G = ({S}, {a, b}, S, {S aSb, S bSa, S SS, S }).
1. Does every string recognized by this grammar have an equal number of a's
and b's?
2. Is every string consisting of an equal number of a's and b's recognized by this
grammar?
Example 5
The language L, consisting of balanced strings of parentheses, is context-free but not
regular. The grammar is simple, but we have to be careful to keep our
symbols ( and ) separate from our metasymbols ( and ).
G = ({S}, {(, )}, S, {S (S), S SS, S }).
Sentential Forms
A sentential form is the start symbol S of a grammar or any string in (V T)* that
can be derived from S.
Consider the linear grammar
({S, B}, {a, b}, S, {S aS, S B, B bB, B }).
A derivation using this grammar might look like this:
S aS aB abB abbB abb
Each of {S, aS, aB, abB, abbB, abb} is a sentential form.
Because this grammar is linear, each sentential form has at most one variable. Hence
there is never any choice about which variable to expand next.
Leftmost and Rightmost Derivations

Now consider the grammar
G = ({S, A, B, C}, {a, b, c}, S, P)

where
P = {S ABC, A aA, A ,B bB, B ,C cC, C }.
With this grammar, there is a choice of variables to expand. Here is a sample

derivation:
S ABC aABC aABcC aBcC abBcC abBc abbBc abbc
If we always expanded the leftmost variable first, we would have a leftmost

derivation:
S ABC aABC aBC abBC abbBC abbC abbcC abbc
Conversely, if we always expanded the rightmost variable first, we would have

a rightmost derivation:
S ABC ABcC ABc AbBc AbbBc Abbc aAbbc abbc
There are two things to notice here:
1. Different derivations result in quite different sentential forms, but

2. For a context-free grammar, it really doesn't make much difference in what
order we expand the variables.
3. Derivation Trees
4. Since the order in which we expand the variables in a sentential form doesn't
seem to make any difference (the textbook contains a proof of this), it would be
nice to show a derivation in some way that is independent of the order.
A derivation tree is a way of presenting a derivation in an order-independent
fashion.
5. For example, for the following derivation:
6. S ABC aABC aABcC aBcC abBcC abBc abbB
c abbc
7. we would have the derivation tree:
8.
9.
This tree represents not just the given derivation, but all the different orders in
which the same productions could be applied to produce the string abbc.
10.A partial derivation tree is any subtree of a derivation tree such that, for any
node of the subtree, either all of its children are also in the subtree, or none of
them are.
11.The yield of the tree is the final string obtained by reading the leaves of the tree
from left to right, ignoring the s (unless all the leaves are , in which case the
yield is ). The yield of the above tree is the string abbc, as expected.
12.The yield of a partial derivation tree that contains the root is a sentential form.
Parsing and Ambiguity

Parsing
There are two ways to use a grammar:
 Use the grammar to generate strings of the language. This is easy -- start with
the start symbol, and apply derivation steps until you get a string composed
entirely of terminals.
 Use the grammar to recognize strings; that is, test whether they belong to the
language. For CFGs, this is usually much harder.
A language is a set of strings, and any well-defined set must have a membership
criterion. A context-free grammar can be used as a membership criterion -- if we can
find a general algorithm for using the grammar to recognize strings.
Parsing a string is finding a derivation (or a derivation tree) for that string.
Parsing a string is like recognizing a string. An algorithm to recognize a string will

give us only a yes/no answer; an algorithm to parse a string will give us additional
information about how the string can be formed from the grammar.
Generally speaking, the only realistic way to recognize a string of a context-free

grammar is to parse it.
Exhaustive Search Parsing

The basic idea of exhaustive search parsing is this: to parse a string w, generate all
strings in L and see if w is among them.
Problem: L may be an infinite language.
We need two things:
1. A systematic approach, so that we know we haven't overlooked any strings, and

2. A way to stop after generating only a finite number of strings -- knowing that, if
we haven't generated w by now, we never will.
Systematic approaches are easy to find. Almost any exhaustive search technique will
do.
We can (almost) make the search finite by terminating every search path at the point
that it generates a sentential form containing more than |w| terminals.
Grammars for Exhaustive Parsing

The idea of exhaustive search parsing for a string w is to generate all strings of length
not greater than |w|, and see whether w is among them. To ensure that the search is
finite, we need to make sure that we can't get into an infinite loop applying
productions that don't increase the length of the generated string.
Note: for the time being, we will ignore the possibility that is in the language.
Suppose we make the following restrictions on the grammar:
 Every variable expands to at least one terminal. We can enforce this by

disallowing productions of the form A .
 Every production either has at least one terminal on its right hand side (thus
directly increasing the number of terminals), or it has at least two variables
(thus indirectly increasing the number of terminals). In other words, we
disallow productions of the form A B, where A and B are both variables.
With these restrictions,

 A sentential form of length n yields a sentence of length at least n.
 Every derivation step increases either the length of the sentential form or the
number of terminals in it.
 Hence, any string w L can be generated in at most 2|w|-1 derivation steps.
 Grammars for Exhaustive Parsing II

 We have shown that exhaustive search parsing is a finite process, provided that
there are no productions of the form A or A B in the grammar. Chapter
6 of your textbook (which we will not cover in this course) describes methods
for removing such productions from a grammar without altering the language
recognized by the grammar. There is, however, one special case we need to
consider.
 If belongs to the language, we need to keep the production S . This
creates a problem if S occurs on the right hand side of some production,
because then we have a way of decreasing the length of a sentential form. All
we need to do in this case is to add a new start symbol, say S 0, and to replace
the production S with the pair of productions
 S0
 S0 S
 Efficient Parsing
 Exhaustive search parsing is, of course, extremely inefficient. It requires time
exponential in |w|.
 For any context-free grammar G, there are algorithms for parsing strings w
L(G) in time proportional to the cube of |w|. This is still unsatisfactory for
practical purposes.
 There are ways to further restrict context-free grammars so that strings may be
parsed in linear or near-linear time. These restricted grammars are covered in
courses in compiler construction, but will not be considered here. All such
methods do reduce the power of the grammar, thus limiting the languages that
can be recognized. There is no known linear or near-linear algorithm for
parsing strings of a general context-free grammar.
 Ambiguity
 The following grammar generates strings having an equal number of a's and b's.
 G = ({S}, {a, b}, S, S aSb | bSa | SS | )
 The string "abab" can be generated from this grammar in two distinct ways, as
shown by the following derivation trees:

 Similarly, abab has two distinct leftmost derivations:

 S aSb abSab abab
 S SS aSbS abS abaSb abab
 Likewise, abab has two distinct rightmost derivations:
 S aSb abSab abab
 S SS SaSb Sab aSbab abab
 Each derivation tree can be turned into a unique rightmost derivation, or into a
unique leftmost derivation. Each leftmost or rightmost derivation can be turned
into a unique derivation tree. So these representations are largely
interchangeable.
Ambiguous Grammars, Ambiguous

Languages
Because derivation trees, leftmost derivations, and rightmost derivations are
equivalent notations, the following definitions are equivalent:
A grammar G is ambiguous if there exists some string w L(G) for which
 there are two or more distinct derivation trees, or

 there are two or more distinct leftmost derivations, or
 there are two or more distinct rightmost derivations.
Grammars are used in compiler construction. Ambiguous grammars are undesirable

because the derivation tree provides considerable information about the semantics of a
program; conflicting derivation trees provide conflicting information.
Ambiguity is a property of a grammar, and it is usually (but not always) possible to

find an equivalent unambiguous grammar.
An inherently ambiguous language is a language for which no unambiguous grammar

exists.
Nondeterministic Pushdown Automata
Formal Definition of NPDA
A dfa (or nfa) is not powerful enough to recognize many context-free languages
because a dfa can't count. But counting is not enough -- consider a language of
palindromes, containing strings of the form ww . Such a language requires more
than an ability to count; it requires a stack.
A nondeterministic pushdown automaton (npda) is basically an nfa with a stack added

to it.
We start with the formal definition of an nfa, which is a 5-tuple, and add two things to
it:
 is a finite set of symbols called the stack alphabet, and

 z is the stack start symbol.
We also need to modify , the transition function, so that it manipulates the stack.
A nondeterministic pushdown automaton or npda is a 7-tuple
M = (Q, , , , q0, z, F)
where

 is a the input alphabet,
 is the stack alphabet,
 is a transition function,
 z is the stack start symbol, and
 Transition Functions for NPDAs
 The transition function for an npda has the form
 : Q ( { }) finite subsets of Q *
 is now a function of three arguments. The first two are the same as before:
the state, and either or a symbol from the input alphabet. The third argument
is the symbol on top of the stack. Just as the input symbol is "consumed" when
the function is applied, the stack symbol is also "consumed" (removed from the
stack).
 Note that while the second argument may be rather than a member of the
input alphabet (so that no input symbol is consumed), there is no such option
for the third argument. always consumes a symbol from the stack; no move is
possible if the stack is empty.
 In the deterministic case, when the function is applied, the automaton moves
to a new state q Q and pushes a new string of symbols x * onto the stack.
Since we are dealing with a nondeterministic pushdown automaton, the result
of applying is a finite set of (q, x) pairs. If we were to draw the automaton,
each such pair would be represented by a single arc.
 As with an nfa, we do not need to specify for every possible combination of
arguments. For any case where is not specified, the transition is to Q, the
empty set of states.
 Drawing NPDAs
 NPDAs are not usually drawn. However, with a few minor extensions, we can
draw an npda similar to the way we draw an nfa.
 Instead of labeling an arc with an element of , we can label arcs with a/x,y
where a , x , and y *.
 Consider the following npda (example 7.2 on page 186 in your textbook).
 (Q={q0,q1,q2,q3}, ={a,b}, ={0,1}, , q0, z=0, F={q3})
 where
 (q0, a, 0) = {(q1, 10), (q3, )}
(q0, , 0) = {(q3, )}
(q1, a, 1) = {(q1, 11)}
(q1, b, 1) = {(q2, )}
(q2, b, 1) = {(q2, )}
(q2, , 0) = {(q3, )}
 This npda can be drawn as
Note: the top of the stack is considered to

be to the left, so that, for example, if we
get an a from the starting position, the
stack changes from to .
NPDA Execution
Suppose someone is in the middle of stepping through a string with a dfa, and we
need to take over and finish the job. We will need to know two things: (1) the state the
dfa is in, and (2) what the remaining input is. But if the automaton is an npda instead
of a dfa, we also need to know (3) the contents of the stack.
An instantaneous description of a pushdown automaton is a triplet (q, w, u), where
 q is the current state of the automaton,

 w is the unread part of the input string, and
 u is the stack contents (written as a string, with the leftmost symbol at the top
of the stack).
Let the symbol " " indicate a move of the npda, and suppose that (q1, a, x) = {(q2,
y), ...}. Then the following move is possible:
(q1, aW, xZ) (q2, W, yZ)
where W indicates the rest of the string following the a, and Z indicates the rest
of the stack contents underneath the x. This notation says that in moving from
state q1 to state q2, an a is consumed from the input string aW, and the x at the
top (left) of the stack xZ is replaced with y, leaving yZ on the stack.
Accepting Strings with an NPDA

Suppose you have the npda M = (Q, , , , q0, z, F).
How do you use this npda to recognize strings?
To recognize string w, begin with the instantaneous description
(q0, w, z)
where
 q0 is the start state,
 w is the entire string to be processed, and
 z is the start stack symbol.
Starting with this instantaneous description, make zero or more moves, just as you
would with an nfa. There are two kinds of moves that you can make:
 -transitions. If you are in state q1, x is the top (leftmost) symbol in the stack,
and (q1, , x) = {(q2, w2), ...}, then you can replace the symbol x with the
string w2 and move to state q2.
 Nonempty transitions. If you are in state q 1, a is the next unconsumed input
symbol, x is the top (leftmost) symbol in the stack, and (q1, a, x) = {(q2,
w2), ...}, then you can remove the a from the input string, replace the symbol x
with the string w2, and move to state q2.
If you are in a final state when you reach the end of the string (and maybe make
some transitions after reaching the end), then the string is accepted by the npda. It
doesn't matter what is on the stack.
As usual with nondeterministic machines, the string is accepted if there is any way it
could be accepted. If we take the "oracle" viewpoint, then every time we have to make
a choice, we magically always make the right choice, so we will end in a final state if
at all possible.
Example NPDA Execution

Consider the following npda:
(q0, a, 0) = {(q1, 10), (q3, )}

(q0, , 0) = {(q3, )}
(q1, a, 1) = {(q1, 11)}
(q1, b, 1) = {(q2, )}
(q2, b, 1) = {(q2, )}
(q2, , 0) = {(q3, )}
We can recognize the string aaabbb by the following sequence of moves:
(q0, aaabbb, 0)

(q1, aabbb, 10)
(q1, abbb, 110)
(q1, bbb, 1110)
(q2, bb, 110)
(q2, b, 10)
(q2, , 0)
(q3, , ).
Since q3 F, the string is accepted.
Accepting Strings with an NPDA (Formal

Version)
We have the notation " " to indicate a single move of an npda. We will also use " "
to indicate a sequence of zero or more moves, and we will use " " to indicate a
sequence of one or more moves.
If M = (Q, , , , q0, z, F) is an npda, then the language accepted by M, L(M), is
given by
L(M) = {w *: (q0, w, z) (p, , u), p F, u *}.
You should understand this notation.
NPDAs and CFGs

Simplifying Context-Free Grammars
The productions of context-free grammars can be coerced into a variety of forms
without affecting the expressive power of the grammar.
Empty production removal

If the empty string does not belong to a language, then there is a way to eliminate
productions of the form A from the grammar.
If the empty string does belong to a language, then we can eliminate from all

productions save for the single production S . In this case we can also eliminate
any occurrences of S from the right-hand-side of productions.
Unit production removal

We can eliminate productions of the form A B from a context-free grammar.
Left Recursion Removal

A variable A is left-recursive if it occurs in a production of the form
A Ax
for any x (V T)*. A grammar is left-recursive if it contains at least one left-
recursive variable.
Every context-free language can be represented by a grammar that is not left-

recursive.
Normal Forms of Context-Free Grammars

Chomsky Normal Form
A grammar is in Chomsky Normal Form if all productions are of the form
A BC
or
A a
where A, B, and C are variables and a is a terminal. Any context-free grammar that
does not contain can be put into Chomsky Normal Form.
(Most textbook authors also allow the production S so long as S does not appear
on the right hand side of any production.)
Chomsky Normal Form is particularly useful for programs that have to manipulate
grammars.
Greibach Normal Form

A grammar is in Greibach Normal Form if all productions are of the form
A ax
where a is a terminal and x V*.
Grammars in Greibach Normal Form are typically ugly and much longer than the cfg
from which they were derived. Greibach Normal Form is useful for proving the
equivalence of cfgs and npdas. When we discuss converting a cfg to an npda, or vice
versa, we will use Greibach Normal Form.
From CFG to NPDA

For any context-free grammar in Greibach Normal Form we can build an equivalent
nondeterministic pushdown automaton. This establishes that an npda is at least as
powerful as a cfg.
Key idea: Any string of a context-free language has a leftmost derivation. We set up

the npda so that the stack contents "correspond" to this sentential form; every move of
the npda represents one derivation step.
The sentential form is:
 the characters already read,

 PLUS the symbols on the stack
 MINUS the final z (the initial stack symbol).
In the npda we will construct, the states are hardly important at all. All the real work
is done on the stack. In fact, we will use only the following three states, regardless of
the complexity of the grammar:
 Start state q0 just gets things initialized. We use the transition from q 0 to q1 to
put the grammar's start symbol on the stack.
(q0, , z) {(q1, Sz)}
 State q1 does the bulk of the work. We represent every derivation step as a
move from q1 to q1.
 We use the transition from q1 to qf to accept the string.
(q1, , z) {(qf, z)}
Example
Consider the grammar G = ({S, A, B}, {a, b}, S, P), where
P = {S a, S aAB, A aA, A a, B bB, B b}.
These productions can be turned into transition functions by rearranging the

components:
This yields the following table:
(start) (q0, , z) {(q1, Sz)}

S a, (q1, a, S) {(q1, )}
S aAB, (q1, a, S) {(q1, AB)}
A aA, (q1, a, A) {(q1, A)}
A a, (q1, a, A) {(q1, )}
B bB, (q1, b, B) {(q1, B)}
B b (q1, b, B) {(q1, )}
(finish) (q1, , z) {(qf, z)}
For example, the derivation
S aAB aaB aabB aabb

maps into the sequence of moves
(q0, aabb, z) (q1, aabb, Sz) (q1, abb, ABz) (q1, bb, Bz)
(q1, b, Bz) (q1, , z) (q2, , )
From NPDA to CFG, Part I
We have shown that, for any cfg, we can produce an equivalent ndfa. We will now
show that, for any ndfa, we can produce an equivalent cfg. This will establish the
equivalence of cfgs and ndfas.
We assert without proof that any npda can be transformed into an equivalent npda that
has the following form:
 The npda has only one final state, which it enters if and only if the stack is
empty;
 All transitions have the form
(q, a, A) = {c1, c2, c3, ...}
where each ci has one of the two forms
o (qj, )
o (qj, BC)
 From NPDA to CFG, Part II

 When we write a grammar, we can use any variable names we choose. As in
programming languages, we like to use "meaningful" variable names. When we
translate an npda into a cfg, we will use variable names that encode information
about both the state of the npda and the stack contents. Variable names will
have the form [qiAqj], where qi and qj are states and A is a variable. The
"meaning" of the variable [qiAqj] is that the npda can go from state q i with Ax
on the stack to state qj with x on the stack.
 Each transition of the form (qi, a, A) = (qj, )
results in a single grammar rule.
 Each transition of
the form (qi, a,
A) = (qj, BC)
results in a multitude of grammar rules, one
for each pair of states qx and qy in the npda.
This algorithm results in a lot of useless
(unreachable) productions, but the useful
productions define the context-free grammar recognized by the npda.
Deterministic Pushdown Automata

A nondeterministic finite acceptor differs from a deterministic finite acceptor in two
ways:
 The transition function is single-valued for a dfa, multi-valued for an nfa.
 An nfa may have -transitions.
A nondeterministic pushdown automaton differs from a deterministic pushdown

automaton (dpda) in almost the same ways:
 The transition function is at most single-valued for a dpda, multi-valued for
an npda.
Formally: | (q, a, b)| = 0 or 1,

for every q Q, a { }, and b .
 Both npdas and dpdas may have -transitions; but a dpda may have a -
transition only if no other transition is possible.
Formally: If (q, , b) ,

then (q, c, b) = for every c .
A deterministic context-free language is a language that can be recognized by a dpda.
The deterministic context-free languages are a proper subset of the context-free

languages.
A Pumping Lemma for Context-Free

Languages
A Pumping Lemma for CFGs
A pumping lemma is a theorem used to show that, if certain strings belong to a
language, then certain other strings must also belong to the language. In this section
we discuss a pumping lemma for context-free languages.
We will show that, if L is a context free language, then strings of L that are at least m
symbols long can be "pumped" to produce additional strings in L. (The value of m
depends on the particular language.)
Let L be an infinite context-free language. Then there is some positive integer m such
that, if S is a string of L of length at least m, then
 S = uvwxy (for some u, v, w, x, y)

 |vwx| m
 |vx| 1
 uviwxiy L
for all nonnegative values of i.
Let's see what this says.
 If S is a sufficiently long string, then there are two substrings, v and x,

somewhere in S. There is stuff (u) before v, stuff (w) between v and x, and stuff
(y) after x.
 The stuff between v and x won't be too long, because |vwx| can't be larger than
m.
 Substrings v and x won't both be empty (though either one could be).
 If we duplicate substring v some number (i) of times, and duplicate x the same
number of times, the resultant string will also be in L.
Preliminary Definitions
A variable is useful if it occurs in the derivation of some string. This requires that
 the variable occurs in some sentential form (you can get to the variable if you
start from S), and
 a string of terminals can be derived from the sentential form (the variable isn't a
"dead end").
A variable is recursive if it can generate a string containing itself. For example,

variable A is recursive if
S uAy
for some values of u and y.
A recursive variable A can be either
 directly recursive, that is, there is a production A x1Ax2 for some strings x1,
x2 (T V)*, or
 indirectly recursive, that is, there is are variables Xi and productions
A ...X1...
X1 ...X2...
X2 ...X3...
...
XN ...A...
Proving the Pumping Lemma, Part I

Suppose we have a context-free language L. Then there is some context-free grammar
G that generates L. Further suppose
 L is infinite, hence there is no upper bound on the length of strings belonging to
L.
 L does not contain .
 G has no unit productions or productions.
There are only a finite number of variables in a grammar, and the productions for each
variable have finite lengths. The only way that a grammar can generate arbitrarily
long strings is if one or more variables is both useful and recursive.
If a variable is not useful, it does not occur in the derivation of any string of the
language. Useless variables can always be eliminated from a grammar.
Suppose that no variable in the grammar is recursive. Since the start symbol is
nonrecursive, it must be defined only in terms of terminals and other variables. Then
since those variables are nonrecursive, they have to be defined in terms of terminals
and still other variables, and so on. After a while we run out of "other variables" while
the generated string is still finite. Hence there is an upper bound on the length of the
string that can be generated from the start symbol. This contradicts our premise that
the language is finite. Therefore, our assumption than no variable is recursive must be
incorrect.
Proving the Pumping Lemma, Part II

Consider a string X belonging to L. If X is sufficiently long, then the derivation of X
must have involved recursive use of some variable A.
Since A was used in the derivation, the derivation must have started as
S uAy
for some values of u and y. Since A was used recursively, the derivation must have
continued as
S uAy uvAxy
Finally, the derivation must have eliminated all variables to reach a string X in the
language:
S uAy uvAxy uvwxy = X
This shows that derivation steps
A vAx
and
A w
are possible. Hence the derivation
A vwx
must also be possible.
(Notice, by the way, that the above does not imply that A was used recursively only
once. The "*" of " " could cover many uses of A, as well as other recursive
variables.)
There has to be some "last" recursive step. Consider the longest strings that can be
derived for v, w, and x without the use of recursion. Then there is a number m such
that |vwx| < m.
Since the grammar (by hypothesis) does not contain any -productions or unit
productions, every derivation step either introduces a terminal or increases the length
of the sentential form. Since A vAx, it follows that |vx| > 0.
Finally, since uvAxy occurs in the derivation, and A vAx and A w are both
possible, it follows that uviwxiy also belongs to L.
This completes the proof of all parts of lemma.
Using the Pumping Lemma

The pumping lemma can be used to show that certain languages are not context free.
As an example, we will show that the language L = {a ibici: i > 0} is not context-free.
Suppose L is context-free. If string X L, where |X| > m, it follows that X=uvwxy,
where |vwx| m. Choose a value for i that is greater than m. Then, wherever vwx
occurs in the string aibici, it cannot contain more than two distinct letters--it can be all
a's, all b's, all c's, or it can be a's and b's, or it can be b's and c's. Thus the string vx
cannot contain more than two distinct letters; but by the pumping lemma, it cannot be
empty, either, so it must contain at least one letter.
Now we are ready to "pump." Since uvwxy is in L, uv 2wx2y must also be in L. Since v
and x can't both be empty, |uv2wx2y| > |uvwxy|, so we have added letters. But since vx
does not contain all three distinct letters, we cannot have added the same number of
each letter. Thus uv2wx2y cannot be in L.
We have arrived at a contradiction. Therefore our original assumption, that L is

context free, must be false. Q.E.D.
Turing Machines
Informal Definition of Turing Machines
A Turing machine is a lot like a pushdown automaton. Both have a finite-state machine as a central component;
both have additional storage. But where a pushdown automaton uses a stack for storage, a Turing machine uses a
tape, which is considered to be infinite in both directions. The tape consists of a series of squares, each of which
can hold a single symbol. The tape head, or read-write head, can read a symbol from the tape, write a symbol to
the tape, and move one square in either direction.
Turing machines can be deterministic or nondeterministic. We will consider only deterministic machines.
Unlike the other automata we have discussed, a Turing machine does not read "input." Instead, there may be (and
usually are) symbols on the tape before the Turing machine begins; the Turing machine might read some, all, or
none of these symbols. The initial tape may, if desired, be thought of as "input."
We have defined acceptors, which produce only a binary (accept/reject) output, and transducers, which can produce
more complicated results. However, all our work so far has been with acceptors. A Turing machine also accepts or
rejects its input. More importantly, the results left on the tape when the Turing machine finishes can be regarded as
the "output" of the computation; thus, a Turing machine is a transducer.
Formal Definition of Turing Machines

A Turing machine M is a 7-tuple
(Q, , , , q0, #, F)
where
 Q is a set of states,
 is a finite set of symbols, the tape alphabet,
 is the partial transition function,
 # is a symbol called blank,
Because the Turing machine has to be able to find its input, and to know when it has processed all of that input, we
require:
 The tape is initially blank (every symbol is #) except possibly for a finite, contiguous sequence of symbols.
 If there are initially nonblank symbols on the tape, the tape head is initially positioned on one of them.
Most other textbooks make no distinction between (the input alphabet) and (the tape alphabet). Our textbook
does this to emphasize that the "input" (the nonblank symbols on the tape) does not contain #. Also, there may
be more symbols in than are present in the input.
Transition Function, Instantaneous

Descriptions, and Moves
The transition function for Turing machines is given by
:Q Q {L, R}
This means:
When the machine is in a given state (Q) and reads a given symbol ( ) from the tape, it replaces the symbol on the
tape with some other symbol ( ), goes to some other state (Q), and moves the tape head one square left (L) or right
(R).
An instantaneous description or configuration of a Turing machine requires (1) the state the Turing machine is in,
(2) the contents of the tape, and (3) the position of the tape head on the tape. This can be summarized in a string of
the form
xi...xjqmxk...xl
where the x's are the symbols on the tape, qm is the current state, and the tape head is on the square containing xk (the
symbol immediately following qm).
A move of a Turing machine can therefore be represented as a pair of instaneous descriptions, separated by the
symbol " ". For example, if
(q5, b) = (q8, c, R)
then a possible move might be
abbabq5babb abbabcq8abb
Programming a Turing Machine

Just as the productions are the "heart" of a grammar, in that most of the other components can be figured out by
looking at the productions, the transitions are the heart of a Turing machine. These transitions are often given as a
table or list of 5-tuples, where each tuple has the form
(current state, symbol read, symbol written,

direction, next state)
Creating such a list is often spoken of as "programming" a Turing machine.
A Turing machine is often defined to start with the read head positioned over the first (leftmost) input symbol. This
isn't really necessary, because if the Turing machine starts anywhere on the nonblank portion of the tape, it's simple
to get to the first input symbol. For the input alphabet = {a, b}, the following program fragment does the trick,
then goes to state q1.
(q0, a, a, L, q0)
(q0, b, b, L, q0)
(q0, #, #, R, q1)
Turing Machines as Acceptors

A Turing machine halts when it no longer has any available moves. If it halts in a final state, it accepts its input;
otherwise, it rejects its input.
This is too easy, so let's repeat it in symbols:

A Turing Machine T = (Q, , , , q0, #, F) accepts a language L(M), where
L(M) = (w +: q0w xiqfxj

for some qf F, xi, xj *}
(Notice that this definition assumes that the Turing machine starts with its tape head positioned on the leftmost
symbol.)
We said a Turing machine accepts its input if it halts in a final state. There are two ways this could fail to happen:
1. The Turing machine could halt in a nonfinal state, or

2. The Turing machine could never stop (in which case we say it is in an infinite loop. )
If a Turing machine halts, the sequence of configurations leading to the halt state is called a computation.
Recognizing a Language
This machine will match strings of the form {a b : n 0}.
q1 is the only final state.
current | symbol | symbol | | next

state | read | written | direction | state
------------------------------------------------------
Find the left end of the input
------------------------------------------------------
q0 | a | a | L | q0
q0 | b | b | L | q0
q0 | # | # | R | q1
------------------------------------------------------
Erase the 'a' at the left end of the input
------------------------------------------------------
q1 | a | # | R | q2
------------------------------------------------------
Find the right end of the input
------------------------------------------------------
q2 | a | a | R | q2
q2 | b | b | R | q2
q2 | # | # | L | q3
------------------------------------------------------
Erase the 'b' at the left end of the input
------------------------------------------------------
q3 | b | # | L | q0
Turing Machines as Transducers

A Turing machine can be used as a transducer. The most obvious way to do this is to treat the entire nonblank
portion of the initial tape as input, and to treat the entire nonblank portion of the tape when the machine halts as
output.
In other words, a Turing machine defines a function y=f(x) for strings x, y * if
q0x qfy
where qf is a final state. (Actually, this definition is a bit stronger than necessary; can you see how?)
A function f is Turing computable if there exists a Turing machine that can perform the above task.
Sorting
Given a string consisting of a's and b's, this machine will rearrange the string so that all the a's come before all the
b's.
current | symbol | symbol | | next

state | read | written | direction | state
-------------------------------------------------------
Find the left end of the input
-------------------------------------------------------
q0 | a | a | L | q0
q0 | b | b | L | q0
q0 | # | # | R | q1
-------------------------------------------------------
Find the leftmost 'b'
-------------------------------------------------------
q1 | a | a | R | q1
q1 | b | b | R | q2
q1 | # | # | L | final
-------------------------------------------------------
Look for an 'a' to the right of a 'b', replace with 'b'
-------------------------------------------------------
q2 | a | b | L | q3
q2 | b | b | R | q2
q2 | # | # | L | final
-------------------------------------------------------
Already replaced 'a' with 'b', now replace 'b' with 'a'
-------------------------------------------------------
q3 | b | a | L | q0
Universal Turing Machines

Variations on Turing Machines I
According to our textbook, two automata are said to be equivalent if they accept the same language. I will use a
somewhat more general definition: two transducers are equivalent if they compute the same function.
A class of automata (e.g. standard Turing machines) is equivalent to another class of automata (e.g. nondeterministic
Turing machines) if, for each transducer in one class, an equivalent transducer can be found in the other class.
At each move of a Turing machine, the tape head may move either left or right. We may augment this with a stay
option: we will add "don't move" to the set {L, R}.
Theorem. Turing machines with a stay option are equivalent to standard Turing machines.
An n-track Turing machine is one in which each square of the tape holds an ordered n-tuple of symbols from the
tape alphabet. You can think of this as a Turing machine with multiple tape heads, all of which move in lock-step
mode.
Theorem. N-track Turing machines are equivalent to standard Turing machines.
Variations on Turing Machines II

A Turing machine may have a semi-infinite tape; the nonblank input is at the extreme left end of the tape.
Theorem. Turing machines with semi-infinite tape are equivalent to standard Turing machines.
An off-line Turing machine has two tapes. One tape is read-only and contains the input; the other is read-write and is
initially blank.
Theorem. Off-line Turing machines are equivalent to standard Turing machines.
A multitape Turing machine has a finite number of tapes, each with it's own independently controlled tape head.
Theorem. Multitape Turing machines are equivalent to standard Turing machines.
A nondeterministic Turing machine is one in which the dfa controlling the tape is replaced with an nfa.
Theorem. Nondeterministic Turing machines are equivalent to standard Turing machines.
Variations on Turing Machines III

A multidimensional Turing machine has a multidimensional "tape;" for example, a two-dimensional Turing machine
would read and write on an infinite plane divided into squares, like a checkerboard. Possible directions that the
tape head could move might be labeled {N, E, S, W}. A three-dimensional Turing machine might have possible
directions {N, E, S, W, U, D}, and so on.
Theorem. Multidimensional Turing machines are equivalent to standard Turing machines.
A binary Turing machine is one whose tape alphabet consists of exactly two symbols.
Theorem. Binary Turing machines are equivalent to standard Turing machines.
A two-state Turing machine is one that has only two states. (It makes up for this by having a large, albeit finite,
alphabet.)
Theorem. Two-state Turing machines are equivalent to standard Turing machines.

Turing Subroutines
Subroutines are extremely useful in standard programming languages. They allow you to break a complex problem
into simpler and simpler subproblems.
Standard Turing machines do not have subroutines, but it's easy to fake them. We do not need to add any new
features. The basic idea is to use one state, or a small group of states, to perform a single task. For example, the
following state can be used to find the left end of the input:
-------------------------------------------------------
q0 | a | a | L | q0
q0 | b | b | L | q0
q0 | # | # | R | q1
-------------------------------------------------------
This "subroutine" can be entered by going to state q0 (which isn't a problem) and it "exits" by going to state q1
(which is a problem). We would like to have the subroutine exit to different states, depending on where it was
called from. The easy way to do this is to make multiple copies of the subroutine, using the same structure but
different state names, e.g.,
-------------------------------------------------------
q41 | a | a | L | q41
q41 | b | b | L | q41
q41 | # | # | R | q87
-------------------------------------------------------
This approach is cumbersome but theoretically adequate, so long as only a finite number of copies are required.
Universal Turing Machines I

A computer, like an automaton, is necessarily hard-wired to execute a single algorithm. For a general-purpose
stored-program computer, that algorithm is basically "fetch an instruction from the current location; execute the
instruction; go to a new location." We can pull the same trick with a Turing machine: we will write a "program"
directly on the tape, along with the "input" to that program. Then we will design a hard-wired universal Turing
machine to emulate the Turing machine described on the tape.
We will store the program as a sequence of 5-tuples: (old state, symbol read, symbol to write, direction, new state).
We want our universal Turing machine to emulate any other Turing machine, even one with a larger tape alphabet.
Hence, we will have to map the emulated alphabet into our own alphabet. The simplest way to do this is with a
unary encoding, e.g. a1=1, a2=11, a3=111, and so on. we can use a second tape symbol, say "0", to separate the
symbols of the emulated alphabet.
Alternatively, we could use a binary notation, e.g. a1=1, a2=10, a3=11, and so on. We would use some other symbol,
say "x", to separate the characters. Many other encoding schemes are possible.
Similarly, we can use strings of digits to encode states, and to encode directions (L and R).
For example, to encode the 5-tuple
(q1, a2, a5, L, q3)
in unary, we could use
...010110111110101110...
This same 5-tuple could be encoded in binary as
...x1x10x101x1x11x...
Universal Turing Machines II

It's a little easier to see how to build a universal Turing machine if we use a three-tape Turing machine.
 One tape holds the "program." This consists of a set of 5-tuples (old state, symbol read, symbol to write,
direction, new state).
 A second tape holds the current "state" of the Turing machine that we are emulating.
 A third tape holds the "input" and will, upon completion, hold the "output." Again, since the tape
alphabet of the emulated Turing machine may be larger than the tape alphabet of our universal Turing
machine, we may need to encode the alphabet. (This is exactly analogous to using eight bits to represent
an ASCII byte.)
We have already noted that a multitape Turing machine is equivalent to a standard Turing machine. In fact, once
you get into the details, it's probably easier to do on a single-tape machine.
We will not complete the development of a universal Turing machine, but it isn't as difficult as you might think. One
of my textbooks has a complete universal Turing machine in only 23 states (and a reasonably small alphabet).
I have heard of a Turing machine that implements a FORTRAN compiler. Some people have too much time on their
hands.
Variations on Turing Machines IV

A linear-bounded automaton is a Turing machine that has only the tape that its input is printed on. There is no
additional blank tape.
Theorem. Linear-bounded automata are not equivalent to standard Turing machines.
The languages recognizable by linear-bounded automata (lbas) are called the context-sensitive languages. We will
briefly touch on context-sensitive languages later in this course.
Turing's Thesis
Alan Turing defined Turing machines in an attempt to formalize the notion of an effective procedure (essentially the
same as what we would call an "algorithm").
At approximately the same time, other mathematicians were independently working on the same problem.
 Alonzo Church -- Lambda calculus

 Emil Post -- Production systems
 Raymond Smullyan -- Formal systems
 Stephen Kleene -- Recursive function theory
 Noam Chomsky -- Unrestricted grammars
All of these formalisms were proved equivalent to one another. This led to
Turing's Thesis (weak form): A Turing machine can compute anything that can be computed by a general-purpose
digital computer.
Turing's Thesis (strong form): A Turing machine can compute anything that can be computed.
The strong form of Turing's thesis cannot be "proved," because it states a relationship between mathematical
concepts and the "real world."
Recursively Enumerable Languages

Counting
Two sets can be put into a one-to-one correspondence if and only if they have exactly the same number of elements.
For example:
{red, yellow, green, blue}

| | | |
{apple, banana, cucumber, plum}
You probably learned to count by putting things into a one-to-one correspondence with your fingers. Now you count
by putting things into a one-to-one correspondence with a subset of the natural numbers (the numbers 1, 2, 3, ...).
Like so:
{red, yellow, green, blue}

| | | |
{ 1, 2, 3, 4 }
In calculus you probably learned that "infinity" is not a number. They lied. Infinity, as a number, is represented by
the symbol 0 pronounced "aleph-null."
A set is denumerable if its elements can be put into a one-to-one correspondence with the natural numbers.
Denumerable sets have 0 elements.
Examples of Denumerable Sets

Example 1. The integers are denumerable. Here is one possible correspondence between the
integers and the natural numbers:
{ 1, 2, 3, 4, 5, 6, 7, 8, 9, ...}
| | | | | | | | |
{ 0, 1, -1, 2, -2, 3, -3, 4, -4, ...}
Notice that, given any natural number N, you can figure out what integer I it corresponds to, and
vice versa:
if even(N) then I:=N/2 else I:=-((N-1)/2);

if I<0 then N:=(-2*I)+1 else N:=2*I;
Since we can put these two sets into a one-to-one correspondence, they must have the same
number of elements, namely, 0.
Example 2. There are as many odd natural numbers as there are even natural numbers. To prove
this, we note the following correspondence:
{ 1, 3, 5, 7, 9, ...}
| | | | |
{ 2, 4, 6, 8, 10, ...}
Example 3. There are as many even natural numbers as there are numbers.
{ 1, 2, 3, 4, 5, ...}
| | | | |
{ 2, 4, 6, 8, 10, ...}
Thus, "half of infinity is infinity." More precisely, 0/2= 0.
Diagonalization
In mathematics (not in computer science!), real numbers are defined to have an infinite number
of digits to the right of the decimal point. Thus, trancendental numbers such as pi and e, as well
as rational numbers such as 2.000000... and 0.171717... are real numbers.
Theorem. The real numbers are not denumerable; there are more real numbers than there are
natural numbers.
We will consider only the real numbers between 0 and 1, or more exactly, the set {x: 0 x < 1}.
To show that these real numbers are not denumerable, we can't just demonstrate an attempted
correspondence that doesn't work; we need to show that no possible correspondence can work.
We do this by a technique called diagonalization.
Proof. Suppose that there exists a one-to-one correspondence between the natural numbers an
the real numbers. Then list the natural numbers and their corresponding real numbers as shown:
1 . 1 4 1 5 9 2 ...
2 . 1 7 1 7 1 7 ...
3 . 7 1 8 2 8 4 ...
4 . 2 5 0 0 0 0 ...
...
(The actual real numbers shown are for illustrative purposes only; to be more formal we should
represent these numbers in some more abstract way, such as .d1,1d1,2d1,3d1,4...)
We claim this correspondence is complete, so every real number must be found somewhere in
the right column. Now consider some number whose first digit (after the decimal point) is
different from the first digit of the first real number (the 1 shown in boldface above); whose
second digit differs from the second digit of the second real number (7); whose third digit differs
from the third digit of the third real number (8); and so on. For example, the real number we are
constructing might start out .2855... (since 2 1, 8 7, 5 8, 5 0, and so on.
We constructed this real number so that it differs from every real number in the right-hand
column by at least one digit. Thus, the number does not appear in the right-hand column. Since
this argument applies to any arbitrary correspondence between the natural numbers and the reals,
no one-to-one correspondence is possible, and the real numbers are not denumerable. Q.E.D.
Nondenumerable Powersets
Theorem. The powerset of an infinite (denumerable) set is not denumerable.
Proof. The proof is by diagonalization.
Put the elements of the infinite set into a one-to-one correspondence with the natural numbers.
(By hypothesis, the set is denumerable, so this step must be possible.) Then we can refer to the
elements of the set as E1, E2, E3, and so on.
The elements of the powerset will be subsets of {E1, E2, E3, ...}. Assume we can put these subsets
into a one-to-one correspondence with the natural numbers, as follows:
E1 E2 E3 E4 E5
______________________________
1 | 0 1 1 0 0 ...
2 | 1 1 0 0 1 ...
3 | 0 0 0 0 1 ...
4 | 0 0 1 1 1 ...
5 | 1 0 1 0 1 ...
... ... ... ... ... ... ...
(In the table, a 1 indicates that the element is present in the subset, and a 0 indicates that it is
absent.) Now construct a new subset, as follows: For each natural number i, element Ei belongs
to this new subset if and only if it doesn't belong to subset i.
Since this new subset differs from every subset in the correspondence by the presence or absence
of at least one element, the correspondence is faulty. But since the only assumption we made
about the correspondence is that it exists, no such correspondence can exist, and the powerset is
not denumerable. Q.E.D.
Turing Machines Are Denumerable

Turing machines can be represented in binary notation. For example, the quintuple
(q5, a2, a3, R, q7)
can be represented as
11111011011101101111111
and the set of quintuple
{(q5, a2, a3, R, q7), (q2, a2, a2, R, q2)
can be represented as
11111011011101101111111011011011011011.
Not every binary number represents a Turing machine. For example, a Turing machine can only
move in two directions (L and R), so a sequence ...01110... for direction would not be
meaningful.
Suppose we count in binary, checking each number in turn to see whether it represents a valid
Turing machine. We assign 1 to the first valid Turing machine we encounter (that is, the one
represented by the smallest binary number), 2 to the second such machine, and so on. Since any
Turing machine can be represented in binary, it should be clear that this establishes a one-to-one
correspondence between Turing machines and the natural numbers. Hence, Turing machines are
denumerable.
Recursive and Recursively Enumerable

Languages
Remember that there are three possible outcomes of executing a Turing machine over a given
input. The Turing machine may
 Halt and accept the input;

 Halt and reject the input; or
 Never halt.
A language is recursive if there exists a Turing machine that

accepts every string of the language and rejects every string
(over the same alphabet) that is not in the language.
Note that, if a language L is recursive, then its complement -L

must also be recursive. (Why?)
A language is recursively enumerable if there exists a Turing

machine that accepts every string of the language, and does
not accept strings that are not in the language. (Strings that are not in the language may be
rejected or may cause the Turing machine to go into an infinite loop.)
Clearly, every recursive language is also recursively enumerable. It is not obvious whether every
recursively enumerable language is also recursive.
Note on terminology: Turing machines aren't "recursive." The terminology is borrowed from
recursive function theory (Turing machines are equivalent to general recursive functions). The
terms really don't make sense in this context, so don't worry about trying to make them make
sense.
Enumerating Strings in a Language

To enumerate a set is to place the elements of the set in a one-to-one correspondence with the
natural numbers.
The set of all strings over an alphabet is denumerable (think of a string as a number in an | |-ary
number system). The strings in a language form a subset of the set of all strings over . But can
we enumerate the strings in a language?
If a language is recursive, then there exists a Turing machine for it that is guaranteed to halt. We
can generate the strings of * in a shortest-first order (to guarantee that every finite string will
be generated), test the string with the Turing machine, and if the Turing machine accepts the
string, assign that string the next available natural number.
We can also enumerate the recursively enumerable languages. We have a Turing machine that
will halt and accept any string that belongs to the language; the trick is to avoid getting hung up
on strings that cause the Turing machine to go into an infinite loop. We do this by "time
sharing." Here's how:
W := ; N := 0;
for i := 1 to 0 do {
add the next string in * to set W;
initialize a Turing machine for this new string;
for each string in set W do {
let the Turing machine for it make one move;
if the Turing machine halts {
accept or reject the string as appropriate;
if the string is accepted {
N := N + 1;
let this be string N of the language;
}
remove the string from set W;
}
}
}
Non-Recursively Enumerable Languages

A language is a subset of *. A language is any subset of *.
We have shown that Turing machines are enumerable. Since recursively enumerable languages
are those whose strings are accepted by a Turing machine, the set of recursively enumerable
languages is also enumerable.
We have shown that the powerset of an infinite set is not enumerable -- that it has more than 0
subsets. Each of these subsets represents a language. Therefore, there must be languages that are
not computable by a Turing machine.
According to Turing's thesis, a Turing machine can compute any effective procedure. Therefore,
there are languages that cannot be defined by any effective procedure.
We can find a non-recursively enumerable language X by diagonalization. Using the

enumerations described earlier, let string i belong to language X if and only if it does not belong
to language i.
Problem. I've just defined a procedure for defining a non-recursively enumerable language. Isn't
this a contradiction?
When Recursively Enumerable Implies
Recursive
Suppose a language L is recursively enumerable. That means there exists a Turing machine T1
that, given any string of the language, halts and accepts that string. (We don't know what it will
do for strings not in the language -- it could reject them, or it could simply never halt.)
Now let's also suppose that the complement of L, -L = {w: w L}, is recursively enumerable.
That means there is some other Turing machine T2 that, given any string of -L, halts and accepts
that string.
Clearly, any string (over the appropriate alphabet ) belongs to either L or -L. Hence, any string
will cause either T1 or T2 (or both) to halt. We construct a new Turing machine that emulates
both T1 and T2, alternating moves between them. When either one stops, we can tell (by whether
it accepted or rejected the string) to which language the string belongs. Thus, we have
constructed a Turing machine that, for each input, halts with an answer whether or not the string
belongs to L. Therefore L and -L are recursive languages.
We have just proved the following theorem: If a language and its complement are both
recursively enumerable, then both are recursive.
Recursively Enumerable But Not Recursive

We know that the recursive languages are a subset of the recursively enumerable languages, We
will now show that they are a proper subset.
We have shown how to enumerate strings for a given alphabet, w1, w2, w3, .... We have also
shown how to enumerate Turing machines, T1, T2, T3, .... (Recall that each Turing machine
defines a recursively enumerable language.) Consider the language
L = {wi: wi L(Ti)}
A little thought will show that L is itself recursively enumerable. But now consider its
complement:
-L = {wi: wi L(Ti)}
If -L is recursively enumerable, then there must exist a Turing machine that recognizes it. This
Turing machine must be in the enumeration somewhere -- call it Tk.
Does wk belong to L?
 If wk belongs to L then (by the way we have defined L) Tk accepts this string. But Tk accepts only
strings that do not belong to L, so we have a contradiction.
 If wk does not belong to L, then it belongs to -L and is accepted by Tk. But since Tk accepts wk, wk
must belong to L. Again, a contradiction.
We have now defined a recursively enumerable language L and shown by contradiction that -L is
not recursively enumerable.
We mentioned earlier that if a language is recursive, its complement must also be recursive. If
language L above were recursive, then -L would also be recursive, hence recursively
enumerable. But -L is not recursively enumerable; therefore L must not be recursive.
We have therefore shown that L is recursively enumerable but not recursive, therefore the set of
recursive languages is a proper subset of the set of recursively enumerable languages.
Unrestricted Grammars
Definition of Unrestricted Grammars
The productions of a grammar have the form
(V T)+ (V T) *
The other grammar types we have considered (left linear, right linear, linear, context free) restrict
the form of productions in one way or another. An unrestricted grammar does not.
In what follows, we will attempt to show that unrestricted grammars are equivalent to Turing
machines. Bear in mind that
 A language is recursively enumerable if there exists a Turing machine that accepts every string of
the language, and does not accept strings that are not in the language.
 "Does not accept" is not the same as "reject" -- the Turing machine could go into an infinite loop
instead, and never get around to either accepting or rejecting the string.
Our plan of attack is to show that the languages generated by unrestricted grammars are precisely
the recursively enumerable languages.
From Grammars To Turing Machines

Theorem. Any language generated by an unrestricted grammar is recursively enumerable.
This can be proven as follows:
1. If a procedure exists for enumerating the strings of a language, then the language is recursively
enumerable. (We proved this earlier.)
2. There exists a procedure for enumerating all the strings in any language generated by an
unrestricted grammar. (We will demonstrate the procedure shortly.)
3. Therefore, any language generated by an unrestricted grammar is recursively enumerable.
Here's a review of the argument for (1) above. We prove the language is recursively enumerable
by constructing a Turing machine to accept any string w of the language.
 Build one Turing machine that generates the strings of the language in some systematic order.
 Build a second Turing machine that compares its input to w and accepts its input if the two
strings are identical.
 Build a composite Turing machine that incorporates the two machines above, using the output
of the first as input to the second.
Now to systematically generate all the strings of the language. For other types of grammars it
worked to generate shortest strings first; we don't know how to do that with an unrestricted
grammar, because some productions could make the sentential form shorter. It might take a
million steps to derive .
Instead, we order the strings shortest derivation first. First we consider all the strings that can be
generated from S in one derivation step, and see if any of them are composed entirely of
terminals. (We can do this because there are only a finite number of productions.) Then we
consider all the strings that can be derived in two steps, and so on.
From Turing Machines To Grammars I

We have shown that a Turing machine can do anything that an unrestricted grammar can do.
Now we have to show that an unrestricted grammar can do anything a Turing machine can do.
This can be done by using an unrestricted grammar to emulate a Turing machine. We will give
only the barest outline of the proof.
Recall that a configuration of a Turing machine can be written as a string
xi...xjqmxk...xl
where the x's are the symbols on the tape, qm is the current state, and the tape head is on the
square containing xk (the symbol immediately following qm). It makes sense that a grammar,
which is a system for rewriting strings, can be used to manipulate configurations, which can
easily be written as strings.
A Turing machine accepts a string w if
q0w xqfy
for some strings x and y and some final state qf, whereas a grammar produces a string if
S w.
Because the Turing machine starts with w while the grammatical derivation ends with w, the
grammar we build will run "in reverse" as compared to the Turing machine.
From Turing Machines To Grammars II

Recall that a Turing machine accepts a string w if
q0w xqfy
and that our grammar will run "backwards" compared to the Turing machine.
The productions of the grammar we will construct can be logically grouped into three sets:
1. Initialization. These productions construct the string ...#$xqfy#... where # indicates a blank and $
is a special variable used for termination.
2. Execution. For each transition rule of we need a corresponding production.
3. Cleanup. Our derivation will leave some excess symbols q0, #, and $ in the string (along with the
desired w), so we need a few more productions to clean these up.
For the terminals T of the grammar we will use the input alphabet of the Turing machine.
For the variables V of the grammar we will use
 - , the tape alphabet minus the symbols we took for T.

 A symbol qi for each state of the Turing machine.
 # (blank) and $ (used for termination).
 S (for a start symbol) and A (used for initialization).
From Turing Machines To Grammars III

Initialization. We need to be able to generate any string of the form
#...#$xqfy#...#
Since we need an arbitrary number of "blanks" on either side, we use the productions
S #S | S# | $A
(The $ is a marker we will use later.) Next we use the A to generate the strings x, y , with a
state qf somewhere in the middle:
A sA | As | qf, for all s .
Execution. For each transition rule of we need a corresponding production. For each rule of
the form
(qi, a) = (qj, b, R)
we use a production
bqj qia
and for each rule of the form
(qi, a) = (qj, b, L)
we use a production
qjcb cqia
for every c (the asymmetry is because the symbol to the right of q is the one under the Turing
machine's tape head.)
Cleanup. We end up with a string that looks like #...#$q0w#...#, so we need productions to get
rid of everything but the w:
#
$q0
The Chomsky Hierarchy

Context-Sensitive Grammars and Languages
A context-sensitive language is a language generated by a context-sensitive grammar.
There are two different definitions for "context-sensitive grammar," yielding grammars whose
productions look quite different. However, the grammars are equivalent, in that they describe
(almost) the same languages.
(Original definition) A context-sensitive grammar is one whose productions are all of the form
xAy xvy
where A V and x, v, y (V T)*.
The name "context-sensitive" comes from the fact that the actual string modification is given by
A v, while the x and y provide the context in which the rule may be applied.
(Extra crispy definition) A context-sensitive grammar is one whose productions are all of the
form
x y
where x, y (V T)+, and |x| |y|.
Such a grammar is called noncontracting because derivation steps never decrease the length of
the sentential form.
Most modern textbooks use the second definition given above. It can be shown that the two
kinds of grammars are almost equivalent (generate the same languages) with one exception: one
kind of grammar permits languages to contain the empty string, while the other doesn't. (Easy
question: which one permits ?)
A language L is context-sensitive if there exists a context-sensitive grammar G such that either L

= L(G) or L = L(G) { }.
Notice how this definition carefully sidesteps the question of which kind of context-sensitive
grammar is meant.
Linear-Bounded Automata
A Turing machine has an infinite supply of blank tape. A linear-bounded automaton (lba) is a
Turing machine whose tape is only kn squares long, where n is the length of the input (initial)
string and k is a constant associated with the particular linear-bounded automaton.
Some textbooks define an lba to use only the portion of the tape that is occupied by the input;
that is, k=1 in all cases. The definitions lead to equivalent classes of machines, because we can
compensate for the shorter tape by having a larger tape alphabet.
Theorem. For every context-sensitive language L there exists an lba M such that L=L(M), that
is, M accepts exactly the strings of L.
Theorem. For every language L accepted by an lba there exists a context-sensitive grammar that
produces exactly L (or L - { }, depending on your definition of context-sensitive grammar).
We will not prove these theorems.
Relationship To Other Grammars

Theorem. Every context-free language is context-sensitive.
Proof. The productions of a context-free language have the form A v. The productions of a
context-sensitive language have the form xAy xvy, where x and y are permitted to be .
Q.E.D.
Theorem. There exists a context-sensitive language that is not context-free.
Proof. The language {a b c : n 0} is not context-free (we used a pumping lemma to show
this). We can show that it is context-sensitive by providing an appropriate context-sensitive
grammar. Here are the productions for one such grammar:
S aXBC S aBC X aXBC
X aBC CB BC aB ab
bB bb bC bc cC cc
Theorem. Every context-sensitive language is recursive.
Proof. A context-sensitive grammar is noncontracting; moreover, for any integer n there are only
a finite number of sentential forms of length n. Therefore, for any string w we can set a bound on
the number of derivation steps required to generate w, hence a bound on the number of possible
derivations. The string w is in the language if and only if one of these derivations produces w.
Theorem. There exists a recursive language that is not context-sensitive.
An appropriate example language is given in the textbook.
The Chomsky Hierarchy

The Chomsky hierarchy, as originally defined by Noam Chomsky, comprises four types of
languages and their associated grammars and machines.
Language Grammar Machine Example
Regular grammar
 Right-linear Deterministic or nondeterministic

Regular language grammar a*
finite-state acceptor
 Left-linear
grammar
Nondeterministic pushdown
Context-free language Context-free grammar a b
automaton
Context-sensitive Context-sensitive
Linear-bounded automaton a b c
language grammar
Recursively Any computable

Unrestricted grammar Turing machine
enumerable language function
These languages form a strict hierarchy; that is, regular languages context-free languages
context-sensitive languages recursively enumerable languages.
Extending the Chomsky Hierarchy

We have discussed other types of languages besides those in the "classical" Chomsky hierarchy.
For example, we noted that deterministic pushdown automata were less powerful than
nondeterministic pushdown automata. Here is a table of some of the language classes we have
covered that fit readily into this hierarchy.
Language Machine
Regular language Deterministic or nondeterministic finite-state acceptor
Deterministic context-free language Deterministic pushdown automaton
Context-free language Nondeterministic pushdown automaton
Context-sensitive language Linear-bounded automaton
Recursive language Turing machine that halts
Recursively enumerable language Turing machine
Not all language classes fit neatly into a hierarchy. For example, we have discussed the linear
languages, which (like deterministic context-free languages) fit neatly between the regular
languages and the context-free languages; however, there are languages that are linear but not
deterministic context-free, and there are languages that are deterministic context-free but not
linear.
In fact, mathematicians have defined dozens, maybe hundreds, of different classes of languages,
and write papers on how these relate to one another. You should know at least the four "classic"
categories that are taught in almost every textbook on the subject.
A Random-Access Machine
We will define a random-access machine as follows:
Data types. The only data type we will support is the natural numbers 0, 1, 2, 3, .... (However,
numbers may be arbitrarily large.)
Variables. We will allow an arbitrary number of variables, each capable of holding a single
natural number. All variables will be initialized to 0.
Tests. We will allow the following test: <variable> = 0
Statements. Our language will have the following types of statements:
 if <test> then <statement> else <statement>;

 while <test> do <statement>;
 <variable> := <variable> + 1; (increment)
 <variable> := <variable> - 1; (decrement)
(Note: Decrementing a variable whose value is already zero has no effect.)
In addition, we will permit statements to be executed in sequence (<statement>;

<statement>; <statement>; ...), and we will use parentheses to group a sequence of
statements into a single statement.
This begins to look like a "real" programming language, albeit a very weak one. Here's the point:
this language is equivalent in power to a Turing machine. (You can prove this by using the
language to implement a Turing machine, then using a Turing machine to emulate the language.)
In other words: this language is powerful enough to compute anything that can be computed in
any programming language.
The Turing Tar-Pit

Turing tar-pit /n./
1. A place where anything is possible but nothing of interest is practical. Alan Turing helped lay
the foundations of computer science by showing that all machines and languages capable of
expressing a certain very primitive set of operations are logically equivalent in the kinds of
computations they can carry out, and in principle have capabilities that differ only in speed from
those of the most powerful and elegantly designed computers. However, no machine or language
exactly matching Turing's primitive set has ever been built (other than possibly as a classroom
exercise), because it would be horribly slow and far too painful to use. A 'Turing tar-pit' is any
computer language or other tool that shares this property. That is, it's theoretically universal --
but in practice, the harder you struggle to get any real work done, the deeper its inadequacies
suck you in. Compare bondage-and-discipline language.
2. The perennial holy wars over whether language A or B is the "most powerful".
The Halting Problem (Un decidable

Problems)
Program Self-Application
In computer science, we often write programs that process other programs.
 A compiler reads a program in one language and produces the equivalent program in another
language.
 A preprocessor reads a program with embedded processor commands and produces an
equivalent program without those commands.
 A prettyprinter reads a program and writes a "cleaned up" version of the same program.
Such a program can sometimes be applied to itself.
 A prettyprinter, written in and for the same language, can prettyprint itself.
 A compiler can be written in language X for a new, improved version of language X. This process
is called bootstrapping.
The input to a Turing machine is a string. Turing machines themselves can be written as strings,
and these strings can be used as input to other Turing machines.
In particular, we have already discussed the notion of a universal Turing machine whose input
consists of a description M of some arbitrary Turing machine, and some input w to which machine
M is to be applied (we will write this combined input as M+w), and produces the same output that
would be produced by M. We could write this as UTM(M+w)=M(w).
Since a Turing machine can be represented as a string, it is entirely possible to supply a Turing
machine as input to itself, e.g. UTM(UTM).
The Halting Problem II

This page uses Boolean functions, written in a pseudo-programming language, because I think
they are easier for most people to understand than a strictly formal Turing-machine notation.
Except for notation, the argument presented here is exactly the same as the one in your textbook.
Suppose we have a Turing machine WillHalt which, given an input string M+w, will halt and
accept the string if Turing machine M halts on input w, and will halt and reject the string if Turing
machine M does not halt on input w. Viewed as a Boolean function, WillHalt(M,w) halts and
returns true in the first case, and halts and returns false in the second.
We will prove by contradiction that
Theorem. Turing machine WillHalt(M,w) cannot exist.
The Halting Problem III

There is a simpler, though less satisfying, way to prove that the halting problem is insoluable. In
fact, we have come very close to proving this already in our discussion of recursive and
recursively enumerable functions.
Recall that
 A language is recursively enumerable if there exists a Turing machine that accepts every string in
the language and does not accept any string not in the language.
 A language is recursive if there exists a Turing machine that accepts every string in the language
and rejects every string not in the language.
Proof. If a Turing machine WillHalt could be built, then we could readily build the machine
function Accepts (M, w):

if WillHalt (M, w) then
return M (w);
else
return False;
This Turing machine will always halt and always give the correct answer. If we could build
WillHalt then we could build Accepts. If we could build Accepts, then we would have shown
that every recursively enumerable language is recursive.
However, we showed earlier that there do exist recursively enumerable languages that are not
recursive. (You might wish to review this argument.) Therefore, it must be impossible to build
WillHalt.
Q.E.D.
Implications for Programming

It is important to understand what the Halting Problem does and doesn't say, and how this applies
to computer programming.
First, every existing programming language, and every forseeable programming language, has no
more power than a Turing machine. Hence, this result applies directly to programming
languages.
The theorem does not say that we can never determine whether or not a given program halts on a
given input. Most of the time, for most practical programs, we can and do eliminate infinite
loops from our programs. We can even write a meta-program to check another program for
potential infinite loops, and get this meta-program to work most of the time.
The theorem does say that we cannot ever write such a meta-program and have it work all of the
time. Moreover, the result can be used to demonstrate that certain other programs are also
impossible. Here's the basic outline:
 If we could solve problem X, we could solve the Halting Problem.

 We can't solve the Halting Problem.
 Therefore, we can't solve problem X.
Sometimes you will be able to solve a useful, practical subset of problem X. However, unless
you have a particularly understanding customer, you are generally better off avoiding problem X
altogether.
Implications for Artificial Intelligence

None.
Some philosophers have tried to use the Halting Problem as an argument against the possibility
of intelligent computers. Stripped to its basics, the argument goes like this:
 There are things computers cannot do.

 We can do those things.
 Therefore, we must be superior to computers.
The first premise is undeniably true.
The second premise is generally supported by displaying a program that solves some subset of
the Halting Problem, then describing a clever trick (not incorporated into the program) that
solves a slightly larger subset.
There may well be valid arguments against the possibility of artificial intelligence. This is not
one of them.
Reduction to the Halting Problem

To reduce problem X to the Halting Problem:
 Assume that you have an effective procedure (Turing machine or any other kind of algorithm) to
solve problem X.
 Show how to use the program for X to solve the Halting Problem.
 Conclude that problem X can't be solved.
The state-entry problem
(I prefer to think of this as the dead code problem.) The problem is to determine whether Turing
machine M, when given input w, ever enters state q.
The only way a Turing machine M halts is if it enters a state q for which some transition function
(qi, ai) is undefined. Add a new final state Z to the Turing machine, and add all these missing
transitions to lead to state Z.
Now use the (assumed) state-entry procedure to test whether state Z is ever entered when M is
given input w. This will reveal whether the original machine M halts. We conclude that it must
not be possible to build the assumed state-entry procedure.
Other Unsolvable Problems

 Does a given Turing machine M halt on all inputs?
 Does Turing machine M halt for any input? (That is, is L(M)= ?)
 Does Turing machine M halt when given a blank input tape?
 Do two Turing machines M1 and M2 accept the same language?
 Is the language L(M) finite?
 Does L(M) contain any two strings of the same length?
 Does L(M) contain a string of length k, for some given k?
Post's Correspondence Problem

Let be a finite alphabet, and let A and B be two lists of nonempty strings over , with |A|=|B|.
That is,
A = (w1, w2, w3, ..., wk) and B = (x1, x2, x3, ..., xk)
Post's Correspondence Problem is the following. Does there exist a sequence of integers i1, i2,
i3, ..., im such that m 1 and
wi1wi2wi3...wim = xi1xi2xi3...xim ?
Example. Suppose A = (a, abaaa, ab) and B = (aaa, ab, b). Then the required sequence of
integers is 2,1,1,3, giving
abaaa a a ab = ab aaa aaa b.
This example had a solution. It will turn out that Post's correspondence problem is insoluable in
general.
Church's Thesis
Formal Systems
Two thousand years ago, Euclid set a standard for rigor in geometrical proofs. The rest of
mathematics has never succeeded in reaching this standard.
The required properties of a satisfactory formal system are that it be
 complete -- it must be possible either to prove or to disprove any proposition that can be
expressed in the system.
 consistent -- it must not be possible to both prove and disprove a proposition in the system.
A number of mathematicians have attempted to put mathematics on a firmer, more logical

footing. A major effort was mounted at the end of the 19th century by Alfred North Whitehead
and Bertrand Russell; their Principia Mathematica attempted to use axiomatic set theory to form
a foundation for all of mathematics. This attempt foundered when it was discovered that set
theory is not consistent.
Here is the now-famous problem that demolished the Principia Mathematica. Consider the set of
all sets that do not have themselves as a member. Is this set a member of itself?
Kurt Gödel explored the very notions of completeness and consistency. He invented a numbering
scheme (Gödel numbers) that allowed him to express proofs as numbers (much as we might
consider a computer program to be a very large binary number). He was able to prove the
following result:
If it is possible to prove, within a formal system, that the system is consistent, then the formal
system is not, in fact, consistent.
Or, equivalently,
If a formal system is consistent, then it is impossible to prove (within the system) that it is
consistent.
This result sets very definite limits on the kinds of things that we can know. In particular, it
shows that any attempt to prove mathematics consistent is foredoomed to failure.
Recursive Function Theory

Gödel proved that a sufficiently powerful formal system cannot be both consistent and complete.
(Simple arithmetic on integers is an example of a system that is "sufficiently powerful.") We
prefer to give up completeness rather than consistency, because in a consistent system any
proposition can be proven.
Gödel left open the possibility that we could somehow distinguish between the provable
propositions and the unprovable ones. Ideally, we would like to have a mechanical (algorithmic)
theorem-proving procedure. Alan Turing invented Turing machines in an attempt to solve this
problem. With the Halting Problem, he showed that we cannot, in all cases, distinguish between
soluable and insoluable problems.
Other mathematicians, working with very different models of computation, ended up with very
similar results. One of these was Alonzo Church, who invented recursive function theory.
I have sometimes referred to this course, Formal Languages and Automata Theory, as "compiler
construction made difficult." A fairer statement is that this course presents a mathematician's
view of the subject, while a course in Compiler Construction presents a programmer's view.
In the same way, recursive function theory is "Lisp made difficult." If, like me, you understand
programming more readily than mathematics, learn Lisp before you take a course in recursive
function theory. You would not believe the difference it will make.
Primitive Recursive Functions

The purpose of this section is to give you some idea of the flavor of recursive function theory.
The textbook describes these functions as being over the natural numbers I={0,1,2,3,...}. A better
way to look at recursive functions, though, is as pure symbol systems. Numbers are not used in
the system; rather, we use the system to construct both numbers and arithmetical functions on
numbers. In other words, it's a different numbering system, in the same way that Roman
numerals are different. The correspondence goes like this:
z(x)=0, s(z(x))=1, s(s(z(x)))=2, s(s(s(z(x))))=3, ...
To "translate" to decimal, just count the number of s's surrounding the central z(x).
Now let's get formal. In this system there are only a few basic functions:
 The zero function: z(x)=z(y) for all x,y I. (This is our "zero"; it is written as a function so we don't
have to introduce constants into the system.)
 The successor function: s(x). Informally, this means "x+1". Formally, it doesn't "return a value", it
just sits there: the result of s(x) is s(x).
For convenience, we make the following "abbreviations":
o 0 is shorthand for z(x).
o 1 is shorthand for s(z(x)).
o 2 is shorthand for s(s(z(x))).
o 3 is shorthand for s(s(s(z(x))).
o ...and so on.
 The projector functions:
o p1(x) = x.
o p1(x, y) = x.
o p2(x, y) = y.
The projector (or "pick") functions are just a way of extracting one of the parameters and
discarding the rest. The book defines only p1 and p2 because it uses functions of no more
than two arguments.
Composition and Recursion
If g1, g2, g3, and h are previously defined functions, we can combine them to form new functions.
In a very careful, formal development, they can be combined only in precisely defined ways.
Here is an example of the kind of form required.
 Composition: f(x,y) = h(g1(x,y), g2(x,y)).

This lets us use functions as arguments to functions.
 Primitive recursion. This is a highly structured "recursive routine" with exactly this form:
f(x,0) = g1(x)
f(x, s(y)) = h(g2(x,y), g3(f(x,y)))
Note: The form of primitive recursion given in the textbook is not consistent with the
author's examples. If you want to step through his (or my) examples, use this form
instead.
A primitive recursive function is a function formed from the functions z, s, p1, and p2 by using
only composition and primitive recursion.
Composition and Recursion, Revisited

Just as you can have lots of minor variations on a Turing machine that all turn out to be
equivalent, so too you can have lots of minor variations on the definition of primitive recursive
functions. Here I'm going to try to highlight the important aspects of the definitions.
Composition
The important thing here is that each function be defined only in terms of previously defined
functions. This restriction prevents unwanted kinds of recursion from creeping in, such as
indirect recursion (A calls B, B calls C, C calls A).
Many programming languages have a restriction that you cannot reference a variable until after it
is defined. This is the same kind of restriction.
Primitive recursion
Primitive recursive functions may only use a simple form of recursion. In particular,
 The recursion must be guaranteed to terminate. To ensure this, the function must carry along an
extra parameter that is "decremented" each time the function is called (s(x) is replaced by x),
and halts the recursion when it reaches "zero" (z(x)). That is,
f(..., z(x)) = ...
f(..., s(x)) = ... f(..., x), ...
 The recursive function must appear only once in the definiens (right hand side of the definition).
This restriction prevents various forms of "fancy" recursion.
General recursive functions, which don't have these restrictions, are more powerful than
primitive recursive functions.
Examples I
The following examples show how these can be used to define more complicated functions. My
examples are taken from those in the textbook, but I prefer the notation s(x) to the abbreviation
x+1.
Example. Addition of two numbers.

Strictly according to the form, this is:
add(x, z(x)) = g1(x)

add(x, s(y)) = h(g2(x,y), g3(add(x,y)))
By choosing g1=p1, g2=p1, g3=s, and h=p2, we get
add(x, z(x)) = p1(x)

add(x, s(y)) = p2(p1(x,y), s(add(x,y))) which simplifies to add(x, z(x)) = x
add(x, s(y)) = s(add(x,y))
For example, add(3,2) works as follows: add(s(s(s(z(x)))), s(s(z(x))))

s(add(s(s(s(z(x)))), s(z(x))))
s(s(add(s(s(s(z(x)))), z(x))))
s(s(s(s(s(z(x))))))
Examples II
Example. Multiplication of two numbers.
The key new feature here is the use of a previously defined function, add, in the definition of a
new function. We skip the step of playing around with the pi functions to pick out the right parts,
and go right to the simplified form.
multiply(x, s(z(x))) = x
multiply(x, s(y)) = add(x, multiply(x, y))
Example. Predecessor.
The trick here is that we can't drop below zero, so effectively 0-1=0. To show this, we write a dot
above the minus sign and call it a monus (and no, you don't need to remember this!). In any case,
the function is easy to define:
pred(z(x)) = z(x)
pred(s(x)) = x
(We have taken some liberties with the notation, because the form allowed by the textbook
doesn't allow us to define a function of one variable; we would have to define a function of two
variables, and just ignore one of them.)
Example. Subtraction.
subtract(x, z(x)) = x
subtract(x, s(y)) = pred(subtract(x, y))
Ackermann's Function
Ackermann's function is an example of a function that is mu-recursive but not primitive
recursive. (Mu-recursive functions have the power of a Turing machine.) Here is the definition:
A(0, y) = y + 1
A(x, 0) = A(x - 1, 1)
A(x, y) = A(x - 1, A(x, y - 1))
Looks innocuous, doesn't it?
Ackermann's function is one of the few things I actually remember from the recursive function
theory course I took many long years ago. It's just a really neat function. Play with it a bit and
you'll see what I mean.
Good uses for Ackermann's function
Stress-test your computer. See just how many values of Ackermann's function you can
compute.
Liven up a boring meeting. Instead of sitting there doodling, bring in a copy of Ackermann's
function and see how far you can get with it. If you have ever played with numbers, I guarantee
it will be a lot more interesting than drawing random designs.
Test your programming skills. There are a lot of short cuts you can find to help compute
Ackermann's function much faster. How many of them can you find?
Make money fast. Bet a hotshot programmer that s/he can't write a program in less than an hour
to compute A(5,5). Or be generous -- give them a week.
Turing's Thesis and Church's Thesis

I've always used the two terms interchangeably, but your textbook makes a distinction, so here it
is.
Turing's thesis. Anything that is computable can be computed by a Turing machine. There does
not and cannot exist a machine that can compute things a Turing machine cannot compute.
Church's thesis. All the models of computation yet developed, and all those that may be
developed in the future, are equivalent in power. We will not ever find a more powerful model.
P and NP
Complexity Theory
Complexity theory concerns itself with two kinds of measures: time and space.
Time complexity is a measure of how long a computation takes to execute. For a Turing machine,
this could be measured as the number of moves required to perform a computation. For a digital
computer, it could be measured as the number of machine cycles required for the computation.
Space complexity is a measure of how much storage is required for a computation. For a Turing
machine, the obvious measure is the number of tape squared used; for a digital computer, the
number of bytes used.
Both of these measures are functions of a single input parameter, the size of the input. Again, this
can be defined in terms of squares or bytes.
For any given input size, different inputs typically require different amounts of space and time.
Hence we can discuss for either the average case or for the worst case. Usually we are interested
in worst-case complexity because
 It may difficult or impossible to define an "average" case. For many problems, the notion of
"average case" doesn't even make sense.

 It is usually much easier to compute worst-case complexity.
In complexity theory we generally subject our equations to some extreme simplifications. For
example, if a given algorithm takes exactly 5n3+2n2-n+1003 machine cycles, where n is the size
of the input, we will simplify this to O(n3) (read: Order n-cubed). This is called an order
statistic. Specifically, we:
 Drop all terms except the highest-ordered one, and

 Drop the coefficient of the highest-ordered term.
Justifications for this procedure are:
 For very large values of n, the effect of the highest-order term completely swamps the
contribution of lower-ordered term. We are interested in large values of n because, from a
strictly practical point of view, it is the large problems that give us trouble. Small problems are
almost always feasible to compute.

 Tweaking the code can improve the coefficients, but the order statistic is a function of the
algorithm itself.
Polynomial-Time Algorithms
A polynomial-time algorithm is an algorithm whose execution time is either given by a
polynomial on the size of the input, or can be bounded by such a polynomial. Problems that can
be solved by a polynomial-time algorithm are called tractable problems.
For example, most algorithms on arrays can use the array size, n, as the input size. To find the
largest element in an array requires a single pass through the array, so the algorithm for doing
this is O(n), or linear time.
Sorting algorithms usually require either O(n log n) or O(n2) time. Bubble sort takes linear time
in the best case, but O(n2) time in the average and worst cases. Heapsort takes O(n log n) time in
all cases. Quicksort takes O(n log n) time on average, but O(n2) time in the worst case.
Regarding O(n log n) time, note that
 The base of the logarithms is irrelevant, since the difference is a constant factor, which we
ignore; and
 Although n log n is not, strictly speaking, a polynomial, the size of n log n is bounded by n2,
which is a polynomial.
Probably all the programming tasks you are familiar with have polynomial-time solutions. This
is not because all practical problems have polynomial-time solutions. Rather, it is because your
courses and your day-to-day work have avoided problems for which there is no known practical
solution.
Nondeterministic Polynomial-Time
Algorithms
Recall that a nondeterministic computation can be viewed in either of two ways:
 When a choice point is reached, an infallible oracle can be consulted to determine the correct
choice.
 When a choice point is reached, all choices can be made and computation can proceed
simultaneously.
A nondeterministic polynomial-time algorithm is an algorithm that can be executed in

polynomial time on a nondeterministic machine. The machine can either consult an oracle (in
constant time), or it can spawn an arbitrarily large number of parallel processes.
It should be obvious that this would be a nice machine to have.
Integer Bin Packing

Suppose we are given a set of n positive integers. Our task is to arrange these integers into two
piles, or "bins", so that the sum of the integers in one pile is equal to the sum of the integers in
the other pile.
For example, given the integers
(19, 23, 32, 42, 50, 62, 77, 88, 89, 105, 114, 123, 176)
These numbers sum to 1000. Can they be divided into two bins, bin A and bin B, such that the
sum of the integers in each bin is 500?
There is an obvious nondeterministic algorithm: for each number, put it in the correct bin. This
requires linear time.
There is also a fairly easy deterministic algorithm. There are 13 numbers (n=13), so form the 13-
bit binary number 0000000000000. For i ranging from 1 to 13: if bit i is zero, put integer i into
bin A; if bit i is one, put integer i into bin B. Test the resultant arrangement. If you don't have a
solution yet, add 1 to the binary number and try again. If you reach 1111111111111, stop and
conclude that there is no solution.
This is a fairly simple algorithm; the only problem is that it takes O(2n) time, that is, exponential
time. In the above example, we may need to try as many as 213 arrangements. This is fine for
small values of n (such as 13), but becomes unreasonable for large values of n.
You can find many shortcuts for problems such as this, but the best you can do is improve the
coefficients. The time complexity remains O(2n). Problems that require exponential time are
referred to as intractable problems.
By the way, there are many variations on this problem.
 You can have multiple bins.

 You can have a single bin, and the object is to pack as much as possible into it.
 You can pack objects with multiple dimensions (volume and weight, for example).
 Objects have not only a size but also a value, and the object is to pack as much value as possible.
Boolean Satisfiability
Suppose you have n Boolean variables, named A, B, C, ..., and you have an expression in the
propositional calculus (that is, you can use and, or, and not to form the expression.) Is there an
assignment of truth values to the variables (e.g. A=true, B=true, C=false, ....) that will make the
expression true?
Here is a nondeterministic algorithm to solve the problem: For each Boolean variable, assign it
the proper truth value. This is a linear algorithm.
We can find a deterministic algorithm for this problem in much the same way as we did for the
integer bin problem. Effectively, the idea is to set up a systematic procedure to try every possible
assignment of truth values to variables. The algorithm terminates when a satisfactory solution is
found, or when all 2n possible assignments have been tried. Again, the deterministic solution
requires exponential time.
Problems such as this arise in circuit design.
Additional NP Problems
The following problems all have a polynomial-time solution on a nondeterministic machine, but
an exponential-time solution on a deterministic machine. There are literally hundreds of
additional examples.
The travelling salesman problem
A salesman, starting in Harrisburg, wants to visit every capital city in the 48 continental United
States, returning to Harrisburg as his last stop. In what order should he visit the capital cities so
as to minimize the total distance travelled?
The Hamiltonian circuit problem
Every capital city has direct air flights to at least some other capital cities. Our intrepid salesman
wants to visit all 48 capitals, and return to his starting point, taking only direct air flights. Can he
find a path that lets him do this?
Equivalence of regular expressions
Do two distinct regular expressions represent the same language?
Intersection of finite automata
Given a set of finite automata M1, M2, M3, ..., Mn, all over the same alphabet A, is there some
string in A* that is accepted by all of these automata?
Linear programming
You have on hand X amount of butter, Y amount of flour, Z eggs, etc. You have cookie recipies
that use varying amounts of these ingredients. Different kinds of cookies bring different prices.
What mix of cookies should you make in order to maximize profits?
This type of problem is sufficiently important that entire college courses are devoted to it,
usually in the College of Business.
NP-Complete Problems
All of the known NP problems have a remarkable characteristic: they are all reducible to one
another. What this means is that, given any two NP problems X and Y,
 There exists a polynomial-time algorithm to restate a problem of type X as a problem of type Y,

and
 There exists a polynomial-time algorithm to translate a solution to a type Y problem back into a
solution for the type X problem.
This is what the "complete" refers to when we talk about NP-complete problems.
What this means is that, if anyone ever discovers a polynomial-time algorithm for any of these
problems, then there is an easily-derived polynomial-time algorithm for all of them. This leads to
the famous question:
Does P = NP?
No one has ever found a deterministic polynomial-time algorithm for any of these problems (or
the hundreds of others like them). However, no one has ever succeeded in proving that no
deterministic polynomial-time algorithm exists, either. The status for some years now is this:
most computer scientists don't think a polynomial-time algorithm can exist, but no one knows for
sure. This was a hot research topic for a while, but interest has died down on the problem, for the
simple reason that no one has made any progress (in either direction).
REVIEWS
Languages
A language is a set of strings over some finite alphabet. To be well-defined, a set
requires a membership criterion. Two kinds of membership criteria often used for
languages are grammars and automata. Other kinds of criteria are possible, such as
regular expressions and recursive functions.
Language types can be arranged according to the complexity required of the

membership criterion. The following is the classic Chomsky hierarchy.
Language Grammar Machine Example

Regular grammar
 Right-linear
Deterministic or
grammar
Regular language nondeterministic finite-state a*
acceptor
 Left-linear
grammar
Context-free Nondeterministic pushdown
Context-free grammar a b
language automaton
Context-sensitive Context-sensitive
Linear-bounded automaton a b c
language grammar
Recursively Any computable
Unrestricted grammar Turing machine
enumerable language function
If there exists a grammar of a given type for a language L, then L is no more
complex than the corresponding language type. It is possible that a simpler (less
powerful) grammar exists for the same language.
Grammars
A grammar G is a quadruple G = (V, T, S, P)
where

 P is a finite set of productions (or rules).
A production of an unrestricted grammar has the form
(V T) (V T)
Productions of a context-sensitive grammar can be defined in either of two ways:
xVy x(V T) y or A B, |A| |B|, A contains a variable
Productions of a context-free grammar have the form
V (V T)
A regular grammar is either a right-linear grammar or a left-linear grammar. A

right-linear grammar has productions of the forms
V T*V V T*
whereas a left-linear grammar has productions of the forms
V VT* V T*
Machines and Functions

 is either a stack alphabet or a tape alphabet,
 is a transition function,
 z is the stack start symbol,
 # is a symbol called blank,
A deterministic finite-state acceptor (dfa) is a quintuple

M = (Q, , , q0, F)
with transition function
: Q Q
and extended transition function
: Q Q
A nondeterministic finite-state acceptor (nfa) is the same quintuple with transition
function
: Q ( { } ) 2
A nondeterministic pushdown automaton (npda) is a 7-tuple
M = (Q, , , , q0, z, F)
with transition function
: Q ( { }) finite subsets of Q *
A Turing machine M (also a linear-bounded automaton, or lba) is a 7-tuple
(Q, , , , q0, #, F)
with partial transition function
: Q Q {L, R}
Accepting Strings
The (regular) language accepted by a dfa M is
L(M) = {w : (q0, w) F}.
The (regular) language accepted by an nfa M is
L(M) = {w : (q0, w) F }
The (context-free) language accepted by an npda M is
L(M) = {w *: (q0, w, z) (p, , u), p F, u *}.
An lba is a Turing machine with a limited amount of tape (a linear function of the size
of the input). Lbas accept context-sensitive languages.
The language accepted by a Turing machine M is
L(M) = (w +: q0w xiqfxj

for some qf F, xi, xj *}
Pumping Lemmas
Regular languages
If L is an infinite regular language, then there exists some positive integer m such that
any string w L whose length is m or greater can be decomposed into three parts,
xyz, where
 |xy| is less than or equal to m,

 |y| > 0,
 wi = xyiz is also in L for all i = 0, 1, 2, 3, ....
Context-free languages
Let L be an infinite context-free language. Then there is some positive integer m such
that, if S is a string of L of length at least m, then
 S = uvwxy (for some u, v, w, x, y)

 |vwx| m
 |vx| 1
 uviwxiy L
for all nonnegative values of i.
Usage
Pumping lemmas are used to show that a given language is not of type
(regular/context-free). The argument goes:
 Assume the language is (regular/context-free).

 Choose a string of the language.
 Pump the string to obtain another that, by the pumping lemma, must also be
in the language.
 Show that the pumped string is not in the language.
 Conclude that the language cannot be (regular/context-free).
 Halting Problem, P=NP?

 If we could build a machine that determines, for any Turing machine T and
input W, whether T halts, then we could build the following impossible
machine:
 function Impossible:
 return LoopIfHaltsOnItself (LoopIfHaltsOnItself);

 function LoopIfHaltsOnItself (M):
 return LoopIfHalts (M, M);

 function LoopIfHalts (M, w):
 if WillHalt (M, w) then
 while true do {}
 else
 return false;

 A nondeterministic polynomial-time complete (NP-complete) problem is one

that can be solved in polynomial time on a nondeterministic Turing machine,
but apparently requires exponential time on a deterministic Turing machine.
Problems that require exponential time are called intractable.
 NP-complete problems can be reduced to one another in polynomial time, so if
one such problem has a polynomial-time solution, they all do.

Computing Theory-Comp303 Bestnotes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computing Theory-Comp303 Bestnotes

Uploaded by

Copyright:

Available Formats

http://www.seas.upenn.edu/~cit596/notes/dave/fundamentals.

The powerset of a set Q, written 2 , is the set of all subsets of Q. The notation

Two sets are disjoint if they have no elements in common, that is, if A B = .

Relations and Functions

 You may be asked to learn a very few important proofs

 Conclude that it is true for all P

Proof by contradiction (also called reductio ad absurdum)

 Conclude P must be true

o good background for later part of course

o ::= means "is defined as"

You can invent new nonterminals:

<element list> ::=

<nonempty element list> ::= <element>

Here's how you define the null string:

You can use multiple definitions in place of the "or":

<nonempty element list> ::=

What is BNF notation?

The following is taken from [Marcotty 86]:

The meta-symbols of BNF are:

meaning "is defined as"

angle brackets used to surround category names.

 optional items are enclosed in meta symbols [ and ], example:

this rule is equivalent to the recursive rule:

<identifier> ::= <letter> |

A string is a finite sequence of symbols from .

The empty string is the string of length zero.

denotes the set of all sequences of strings composed of one or more

A language is a subset of .

If a string S can be formed by concatenating two strings A and B, S=AB, then A is

The reverse of a string S, S , is obtained by reversing the sequence of symbols in the

Any string that belongs to a language is said to be a word or a sentence of that

If L, L1 and L2 are languages, then

 L1 L2 is a language.

A production has the form X Y

1. Start with a string w consisting of only the start symbol S.

Each application of step 3 above is called a derivation step.

Suppose w is a string that can be written as uxv,

 u and v are elements of (V T)

Then we can write

 S T : S derives T (or T is derived from S) in exactly one step.

 S T : S derives T in one or more steps.

There is no formal definition for "automaton"--instead, there are various kinds of

 has some form of input,

An automaton that computes a Boolean (yes-no) function is called an acceptor.

Deterministic Finite Acceptors

 Acceptors--produce only a yes/no answer

A DFA is drawn as a graph, with each state represented by a circle.

o Read the next character and advance the read head;

Sample trace: q0 1 q1 0 q3 0 q1 1 q0 1 q1 1 q0 0 q2 0 q0

Since q0 is a final state, the string is accepted.

Implementing a DFA, part 2

Formal Definition of a DFA

We can also define an extended transition function as

If a DFA M = (Q, , , q0, F) is used as a membership criterion, then the set of

Languages that can be defined by dfas are called regular languages.

Acceptor for Ada identifiers

Here is an automaton to recognize Ada identifiers.

 Q is {q0, q1, q2, q3},

Abbreviated Acceptor for Ada Identifiers

 There is exactly one implicit error state;

Nondeterministic Finite Acceptors

A state may have two or more arcs emanating from

1. When the automaton is faced with a choice, it always (magically) chooses

This isn't quite right, because it doesn't allow (bc)b or (cb)c. When we include

X = ( + b + c + (bc)* + (cb)* + (bc)b + (cb)c)

((bc)(b + ) + (cb)(c + ) a (((bc)b + (cb)c) a)* (bc)*(b + ) +