You are on page 1of 66

VYSYA COLLEGE, SALEM-103.

NAME : A. KAJA MOHIDEEN CLASS : I M.Sc. CS


SUBJECT : THEORY OF AUTOMATA UNITS : 5
AUGUST SETEMBER OCTOBER
F!"# : 11$0%$0%
T" : 1&$0%$0%
UNIT ' I
A()"#*)* T+,"!-
- Introduction
- Structural representation
- Automata and complexity
- Alphabets
- Strings
- Languages
- Problems
F!"# : 1%$0%$0%
T" : 31$0%$0%
UNIT ' I .C"/)..0
F1/1), A()"#*)*
- Introduction
- Deterministic finite automata
- Non-deterministic finite
automata
A2231c*)1"/
- Text search
- Finite automata ith !psilon
transition
. T 0
F!"# : 04$05$0%
T" : 15$05$0%
UNIT ' II
R,6(3*! E72!,881"/8
- Finite automata and regular
expression
- Applications of regular
expressions
- Algebraic las of regular
expressions
- Pro"ing languages not to be
regular
- Decision properties of regular
languages
- !#ui"alence and minimi$ation
of automata
- %oore and %ealy machines
. T 0
F!"# : 44$05$0%
T" : 30$05$0%
UNIT ' III
C"/),7) F!,, G!*##*!
- Definition
- Deri"ations using a grammar
- Leftmost and rightmost
deri"ation
- The language of a grammar
- Sentential forms
- Parse trees
(8+ 9":/ *()"#*)*
- Definition
- Languages of a PDA
- !#ui"alence of PDAs and
&F's
- Deterministic PDA
. T0
F!"# : 03$10$0%
T" : 0;$10$0%
UNIT ' IV
T(!1/6 M*c+1/,
- Introduction
- Notations
- Descriptions
- Transition diagrams
- Languages
- Turing machines and halting(
F!"# : 10$10$0%
T" : 1;$10$0%
UNIT ' IV .C"/)..0
- Programming techni#ues of
Turing machines
- %ultitape Turing machines
- )estricted Turing machines
- Turing machines and
computers
. T 0
F!"# : 40$10$0%
T" : 31$10$0%
UNIT ' V
I/)!*c)*<3, !"<3,#8
- The classes P and NP
- The NP complete problem
- &omplements of languages in
NP
- Problem sol"able in
polynomial space
. T 0
U N I T - I
=+*) 18 )+, 2(!2"8, "> 8)(9-1/6 *()"#*)* )+,"!-? ."!0 .5 #*!@80
=+*) 18 )+, /,,9 )" 8)(9- A()"#*)* )+,"!-?
Automata theory is the study of abstract computing de"ices( There are se"eral reasons hy the study of
automata and complexity is an important part of the core of computer science(
I/)!"9(c)1"/ )" F1/1), A()"#*)*
Finite automata are a useful model for many important *inds of hardare and softare( Let us list some of
the most important *inds+
Softare for designing and chec*ing the beha"ior of digital circuits(
The ,lexical analy$er- of a typical compiler. that is. the compiler component that brea*s the input
text into logical units. such as identifiers. *eyords and punctuations(
Softare for scanning large bodies of text. such as collections of eb pages. to find occurrences
of ords. phrases. or other patterns(
Softare for "erifying systems of all types that ha"e a finite number of distinct states. such as
communications protocols or protocols for secure exchange of information(
There are many systems or components. contains finite number of ,states-( The purpose of a state is to
remember the rele"ant portion of the system/s history( The ad"antage of ha"ing only a finite number of
states is that e can implement the system ith a fixed set of resources(
E7*#23,
The simplest finite automaton is an on0off sitch( The de"ice remembers hether it is the ,on- state or the
,off- state. and it allos the user to press a button hose effect is different. depending on the state of the
sitch( That is. if the sitch is in the off state. then pressing the button changes it to the on state. and if the
sitch is in the on state. then pressing the same button turns it to the off state(
Fig+ A finite automaton modeling an on0off sitch
As for all finite automata. the states are represented by circles1 in this example. e ha"e named the states on
and off( Arcs beteen states are labeled by ,inputs-. hich represent external influences on the system(
2ere. both arcs are labeled by the input Push. hich represents a user pushing the button( The intent of the
to arcs is that hiche"er state the system is in. hen the push input is recei"ed it goes to the other state( the
state in hich the system is placed initially is called as the ,start state-.( In our example. the start state is off.
and e con"eniently indicate the start state by the ord Start and an arro leading to that state(
S)!(c)(!*3 R,2!,8,/)*)1"/8:
There are to important notations that are not automaton-li*e. but play an important role in the study of
automata and their applications(
G!*##*!8
">
>
"
/
push
push
start
These are useful models hen designing softare that processes data ith a recursi"e structure(
The best *non example is a ,parser-. the component of a compiler that deals ith the recursi"ely
nested features of the typical programming language. such as expressions-arithmetic. conditional and so
on(
For instance. a grammatical rule li*e ! ! 3 ! states that an expression can be formed by ta*ing any
to expressions and connecting them by a plus sign1 this rule is typical of ho expressions of real
programming languages are formed(
R,6(3*! E72!,881"/8
These also denote the structure of data. especially text strings( The style of these expressions differ
significantly from that of grammars( The 4NI5-style regular expression 67A-89 7a-$9 : 79 7A-89 7A-89/
represents capitali$ed ords folloed by a space and to capital letters( This expression represents patterns
in text that could be a city and state. e(g(. Ithaca N;( It misses multiord city names. such as Palo alto &A.
hich could be captured by the more complex expression
6<7A-89 7a-$9 : 79 7A-89 7A-89=/
hen interpreting such expressions. e only need to *no that 7A-89 represents a range of characters from
capital ,A- to capital ,8- and 79 is used to represent the blan* character alone( Also. the symbol : represents
,any number of- the preceding expression( Parentheses are used to group components of the expression1
they do not represent characters of the text described(
A()"#*)* */9 c"#23,71)-
Automata are essential for the study of the limits of computation( There are to important issues+
>hat can a computer do at all? This study is called ,decidability-. and the problems that can be sol"ed
by computer are called ,decidable-(
>hat can a computer do efficiently? This study is called ,intractability-. and the problems that can be
sol"ed by a computer using no more time than some sloly groing function of the si$e of the input are
called ,tactable-(
G1A, )+, c,/)!*3 c"/c,2)8 "> A()"#*)* )+,"!-. ."!0
E723*1/ <!1,>3- *<"() A32+*<,)8, S)!1/68, L*/6(*6,8 */9 !"<3,#8. .10 M*!@80
The most important definitions include the ,alphabet- <a set of finite symbols=. ,strings-<a list of symbols
from an alphabet= and ,language- < a set of strings from the same alphabet=(
A32+*<,)8
An alphabet is a finite. nonempty set of symbols( >e use the symbol @ for an alphabet( &ommon alphabets
include+
A( @ B CD. AE. the binary alphabet
F( @ B C a. b. GGG((.$E. the set of all loer-case letters(
H( The set of all AS&II characters. or the set of all printable AS&II characters(
S)!1/68
A string <or sometimes ord= is a finite se#uence of symbols chosen from some alphabet( For eg(. DAADA is a
string from the binary alphabet @ B CD. AE( The string AAA is another string chosen from this alphabet(
T+, E#2)- S)!1/6
The empty string is the string ith $ero occurrences of symbols( This string. denoted . is a string that may
be chosen from any alphabet hatsoe"er(
L,/6)+ "> * S)!1/6
It is often useful to classify strings by their length. that is. the number of positions for symbols in the string(
For instance. DAADA has length I( The standard notation for the length of a string is J J(
For eg(. J DAA J B H and J J BD(
":,!8 "> */ A32+*<,)
If @ is an alphabet. the set of all strings of a certain length from that alphabet is expressed by using an
exponential notation( >e define @
*
to be the set of strings of length *. each of hose symbols is in @(
!g+ Note that @
D
BCE. regardless of hat alphabet @ is( That is. is the only string hose length is D(
If @ B CD. AE.
then @
A
B CD. AE.
@
F
B CDD.DA.AD.AAE.
@
H
B C DDD. DDA.DAD.DAA.ADD.ADA.AAD.AAAE G((
There is a slight confusion beteen @ and @
A
( The former is an alphabet1 its members D and A are symbols(
The latter is a set of strings1 its members are the strings D and A. each of hich is of length A(
The set of all strings o"er an alphabet is denoted by
:
(
For instance. CD. AE
:
B C. D. A. DD. DA. AD. AA. DDD. G( E(
That is
:
B
D

A

F
( ( (
The set of nonempty strings from alphabet is denoted by
3
(
Thus the e#ui"alences are
o
3
B
A

F

H
G(
o
:
B
3
CE
C"/c*),/*)1"/ "> 8)!1/68
Let x and y be strings( Then xy denote the concatenation of x and y. that is. the string formed by ma*ing a
copy of x and folloing it by a copy of y( If x is the string composed of i symbols
x B a
A
a
FGGGG(
a
i
and y is the string composed of K symbols
y B b
A
b
FGGGG(
b
K
.
then xy is the string of length i3K1
xy B a
A
a
F
GG(a
i
b
A
b
F
GG(b
K
(
E6: Let x B DAADA and y B AAD( Then xy BDAADAAAD and yx B AADDAADA(
For any string . the e#uations B B
L*/6(*6,8
A set of strings chosen from some @:. is called a language. denoted by L(
The language can be expressed as L @:(
For any programming language. the legal programs are a subset of the possible strings that can be
formed from the alphabet of the language( This alphabet is a subset of the AS&II characters(
For example consider the folloing languages+
o The language of all strings consisting of n D/s folloed by n A/s. for some n D1
C. DA.DDAA.DDDAAA.GGGE(
o The set of strings of D/s and A/s ith an e#ual number of each+
C. DA. AD. DDAA. DADA. ADDA.GG(E
o The set of binary numbers hose "alue is a prime+ CAD. AA. ADA. AAA. ADAA. GGG(E
o @: is a language for any alphabet @(
o . the empty language. is a language o"er any alphabet(
o CE. the language consisting of only the empty string. is also a language o"er any alphabet(
Notice that B CE1 the former has no strings and the latter has one string(
!"<3,#8
In automata theory. a problem is the #uestion of deciding hether a gi"en string is a member of some
particular language( A ,problem- can be expressed as a membership in a language( If @ is an alphabet. and L
is a language o"er @. then the problem L is+
'i"en a string in @:. decide hether or not is in L(
E6: The problem of testing can be expressed by the language L
p
consisting of all binary strings hose
"alue as a binary number is a prime( That is. gi"en a string of D/s and A/s say ,yes- if the string is the
binary representation of a prime and say ,no- if not(
Lne potentially unsatisfactory aspect of our definition of ,problem- is that one commonly thin*s of
problems not as decision #uestions but as re#uests to compute or transform some input( The techni#ue of
shoing one problem hard by using its supposed efficient algorithm to sol"e efficiently another problem
that is already *non to be hard is called a ,reduction- of the second problem to the first(
E723*1/ 1/ 9,)*13 *<"() D,),!#1/18)1c F1/1), A()"#*)*. ."!0
D18c(88 *<"() D,),!#1/18)1c F1/1), A()"#*)*. .10 #*!@80
The term ,deterministic- refers to the fact that on each input there is one and only one state to hich the
automaton can transition from its current state( A deterministic finite automaton consists of+
A( A finite set of states. often denoted M(
F( A finite set of input symbols. often denoted @(
H( A transition function that ta*es as arguments a state and an input symbol and returns a state(
N( A start state. one of the states in M(
I( A set of final or accepting states F( The set F is a subset of M(
A Deterministic Finite Automaton ill often be referred to by its acronym+ DFA( DFA in ,fi"e-tuple-
notation is gi"en as.
A B < M. @. . #
o
. F=
>here A is the name of the DFA. M is its set of states. @ is its input symbols. its transition function. #
o
its
start state. and F its set of accepting states(
L*/6(*6, "> )+, DFA
The ,language- of the DFA is the set of all strings that the DFA accepts(
Suppose a
A
.a
FGGGG
a
n
is a se#uence of input symbols( >e start out ith the DFA in its start state. #
o
(
The transition function . say <#
o
. a
A
= B #
A
to find the state that the DFA enters after processing the first
input symbol a
A
(
Then process the next input symbol a
F
. by e"aluating <#
A
. a
F
=1 continue in this manner. finding states
#
H
.#
NGG((
#
n
such that <#
i-A
. a
i
= B #
i
for each i(
If #
n
is a member of F. then the input a
A
.a
FGGGG
a
n
is accepted. and if not then it is ,reKected-(
E7*#23,:
Let us formally specify a DFA that accepts all and only the strings of D/s and A/s that ha"e the se#uence DA
somehere in the string( >e can rite this language as+
L B C 0 is of the form xDAy for some strings x and y consisting of D/s and A/s only E
Another e#ui"alent description. using parameters x and y to the left of the "ertical bar. is+
CxDAy0x are any strings D/s and A/sE
!xamples of strings in the language include DA. AADAD. and ADDDAA( !xamples of strings not in the language
include . D and AAADDD(
S1#23,! N")*)1"/8 >"! DFAB8
Specifying a DFA as a fi"e-tuple ith a detailed description of the transition function is both tedious and
hard to read( There are to preferred notations for describing automata+
A( A transition diagram hich is a graph
F( A transition table. hich is a tabular listing of the functions. hich by implication tell us the set of
states and the input alphabet(
T!*/81)1"/ D1*6!*#8:
A transition diagram for a DFA A B <M. @. . #
D
. F= is a graph defined as follos+
A( For each state in M there is a node
F( For each state # in M and each input symbol a in @. let <#. a= B p( then the transition diagram has an
arc from node # to node p. labeled a( If there are se"eral input symbols that cause transitions from #
to p. then the transition diagram can ha"e one arc. labeled by the list of these symbols(
H( There is an arro into the start state #
o
. labeled start( This arro does not originate at any node(
N( Nodes corresponding to accepting states are mar*ed by a double circle( States not in F ha"e a single
circle(
E7*#23,
In the figure(. gi"en belo the three nodes that corresponds to the three states( There is a start arro entering
the start state. #
o
. and the one accepting state. #
A
. is represented by a double circle( Lut of each state is one
arc labeled D and one arc labeled A(
The transition diagram for the DFA accepting all strings ith a substring DA
T!*/81)1"/ )*<3,8
A transition table is a con"entional. tabular representation of a function li*e that ta*es to arguments and
returns a "alue( The ros of the table correspond to the states. and the columns correspond to the inputs( The
entry for the ro corresponding to state # and the column corresponding to input a is the state <#. a=(
E7*#23,
The to features of a transition table mar*ed belo ith the start state being mar*ed ith
an arro. and the accepting states are mar*ed ith a star( Since e can deduce the sets of
states and input symbols by loo*ing at the ro and column heads. e can no read from
the transition table all the information e need to specify the finite automaton uni#uely(

E7),/91/6 )+, )!*/81)1"/ >(/c)1"/ )" 8)!1/68:
In terms of the transition diagram. the language of a DFA is the set of labels along all the paths that lead
from the start state to any accepting state(
>e define an extended transition function that describes hat happens hen e start in any state and follo
any se#uence of inputs( If is our transition function. then the extended transition function constructed from
ill be (The extended transition function is a function that ta*es a state # and a string and returns a
state p O the state that the automaton reaches hen starting in state # and processing the se#uence of inputs
( >e define by induction on the length of the input string. as follos+
BASIS <#. = B #( That is. if e are in state # and read no inputs. then e are still in state #(
D A
#
D
#
F
#
D
: #
A
#
A
#
A
#
F
#
F
#
A
INDUCTION Suppose is a string of the form xa1 that is. a is the last symbol of . and x is the
string consisting of all but the last symbol( For eg(. B AADA is bro*en into x B AAD
and a BA( Then <#. = B <<#. x=. a=
To compute <#.=. first compute <#. x=. the state that the automaton is in after processing all but the last
symbol of ( Suppose this state is p1 that is <#. x= B p( Then <#. = is hat e get by ma*ing a transition
from state p on input a. the last symbol of (That is. <#. = B <p. a=
L*/6(*6, "> * DFA:
>e can define the language of a DFA A B<M. . .#
D
. F=( This language is denoted L<A=. and is defined by
L<A= B C 0<#
D
. = is in FE
That is. the language of A is the set of strings that ta*e the start state #
D
to one of the accepting states( If L
is L<A=b for some DFA A. then e say L is a regular language(
D,816/ * DFA )" *cc,2) )+, 3*/6(*6,
L C D : 0 : +*8 <")+ */ ,A,/ /(#<,! "> 0B8 */9 */ ,A,/ /(#<,! "> 1B8E ."!0 G1A,
)+, D,),!#1/18)1c F1/1), A()"#*)* >"! )+, L*/6(*6,
L C D :0: +*8 <")+ */ ,A,/ /(#<,! "> 0B8 */9 */ ,A,/ /(#<,! "> 1B8E .5 #*!@80
The state of the DFA is to count both the number of D/s and the number of A/s. but count then modulo F( That
is. the state is used to remember hether the number of D/s seen so far is e"en or odd. and also to remember
hether the number of A/s seen so far is e"en or odd( There are thus four states. hich can be gi"en the
folloing interpretations+
F
0
+ Poth the number of D/s seen so far and the number of A/s seen so far are e"en(
F
1
+ The number of D/s seen so far is e"en. but the number of A/s seen so far is odd(
F
4
+ The number of A/s seen so far is e"en. but the number of D/s seen so far is odd(
F
3
+ Poth the number of D/s seen so far and the number of A/s seen so far are odd(
State #
D
is both the start state and the lone accepting state( It is the start state. because before reading any
inputs. the numbers of D/s and A/s seen so far are both $ero. and $ero is e"en( It is the only accepting state.
because it describes exactly the condition for a se#uence of D/s and A/s to be in language L( >e no *no
almost ho to specify the DFA for language L( It is
A B <C #
D
. #
A.
#
F.
#
H
E
.
CD.AE. . #
D
. C#
D
E= here the transition function is described by the transition diagram
!ach input D causes the state to cross the hori$ontal. dashed line( Thus. after seeing an e"en number of D/s
e are alays abo"e the line. in state #
D
or #
A
hile after seeing an odd number of D/s e are alays belo
the line. in state #
F
or #
H
( Li*eise. e"ery A causes the state to cross the "ertical. dashed line( Thus. after
seeing an e"en number of A/s. e are alays to the left. in state #
D
or #
A
hile after seeing an odd number of
D/s e are alays belo the line. in state #
F
or #
H
( Li*eise. e"ery A causes the state to cross the "ertical.
dashed line( Thus. after seeing an e"en number of A/s. e are alays to the left. in state #
D
or #
F
. hile after
seeing an odd number of A/s e are to the right. in state #
A
or #
H
(
>e can also represent this DFA by a transition table(
2ere. e illustrate the construction of (Suppose the input is AADADA( Since this string has e"en number of
D/s and A/s both. e expect it is in the language( Thus. e expect <#
D
. AADADA= B #
D
. since #
D
is the only
accepting state( The chec* in"ol"es computing <#
D
. = for each prefix of AADADA. starting at and going
in increasing si$e( The summary of this calculation is+

<#
D
. = B #
D

<#
D
. A= B #
D
B <<#
D
. =.A= B <#
D
.A= B #
A

<#
D
. AA= B <<#
D
. A=.A= B <#
A
.A= B #
D

<#
D
. AAD= B <<#
D
. AA=.D= B <#
D
.D= B #
F

<#
D
. AADA= B <<#
D
. AAD=.A= B <#
F
.A= B #
H

<#
D
. AADAD= B <<#
D
. AADA=.D= B <#
H
.D= B #
A

<#
D
. A= B #
D
B <<#
D
. AADAD=.A= B <#
A
.A= B #
D

E723*1/ *<"() N"/-9,),!#1/18)1c F1/1), A()"#*)*. ."!0 .10 #*!@80
D18c(88 1/ 9,)*13 *<"() NFA?
A non-deterministic finite automaton <NFA= has the poer to be in se"eral states at once( This ability is
often expressed as an ability to ,guess- something about its input( For instance. hen the automaton is used
to search for certain se#uences of characters<eg(. *eyords= in a long text string. it is helpful to ,guess- that
e are at the beginning of one of those strings and use a se#uence of states to do nothing but chec* that the
string appears. character by character(
A/ I/>"!#*3 A1,: "> NFA:
Li*e DFA. NFA has a finite set of states. a finite set of input symbols. one start state and a set of accepting
states( It also has a transition function. hich e shall commonly call ( The difference beteen the DFA
and NFA is in the type of ( For the NFA. is a function that ta*es a state and input symbol as arguments.
but returns a set of $ero. one or more states(
E7*#23,
A non-deterministic finite automaton. hose Kob is to accept all and only the strings of D/s and A/s that end in
DA( State #
D
is the start state and e thin* of the automaton as being in state #
D
hene"er it has not yet
,guessed- that the final DA has begun( It is alays possible that the next symbol does not begin the final DA.
e"en if that symbol is D( Thus. state #
D
may transition to itself on both D and A(


Fig<a= An NFA accepting all strings that end in DA
2oe"er. if the next symbol is D. this NFA also guesses that the final DA has begun( An arc labeled D thus
leads from #
D
to state #
A
( Notice that there are to arcs labeled D out of #
D
( The NFA has the option of going
either to #
D
or to #
A
and in fact it does both( In state #
A
. the NFA chec*s that the next symbol is A. and if so. it
goes to state #
F
and accepts(
Notice that there is no arc out of #
A
labeled D. and there are no arcs at all out of #
F
( In these situations. the
thread of the NFA/s existence corresponding to those states simply ,dies- although other threads may
continue to exist(
The folloing fig<b= suggests ho an NFA processes inputs( >e ha"e shon hat happens hen te
automaton of fig<a= recei"es the input se#uence DDADA( It starts in only its start state. #
D
(>hen the first D is
#
D
#
D
#
D
Start D
A
D. A
read. the NFA may go to either state #
D
or state #
A
. so it does both( These to threads are suggested by the
second column in the fig<b=.
Fig<b= The states of an NFA is in during the processing of input se#uence DDADA
Then. the second D is read( State #
D
may again go to both #
D
and #
A
( 2oe"er. state #
A
has no transition on D.
so it ,dies-( >hen the third input. aA. occurs. e must consider transitions from both #
D
and #
A
( >e find that
#
D
goes only to #
D
on A. hile #
A
goes only to #
F
( Thus. after reading DDA. the NFA is in states #
D
and #
F
( Since
#
F
is an accepting state. the NFA accepts DDA(2oe"er. the input is not finished( The fourth input. aD. causes
#
F
/s thread to die. hile #
D
goes to both #
D
and #
A
(The last input. aA. sends #
D
to #
D
and #
A
to #
F(
Since e are
again in an accepting state. DDADA is accepted(
D,>1/1)1"/ "> N"/9,),!#1/18)1c F1/1), A()"#*)*:
An NFA is represented essentially li*e a DFA+
A B <M. . .#
D.
F=
>here+
A( M is a finite set of states
F( is a finite set of input symbols
H( #
D
a member of M. is the start state
N( F. a subset of M. is the set of final <or accepting= states(
I( . the transition function is a function that ta*es a state in M and an input symbol in as arguments
and returns a subset of M(
E7*#23,
The NFA of fig<a= can be specified formally as
<C #
D
. #
A.
#
F.
E. C D.AE. . #
D
. C #
F
E=
here the transition function is gi"en by the transition table as follos+
0 1
#
D
C #
D.
#
A
E C #
D
E
#
A
C #
F
E
: #
F
Notice that the transition tables can be used to specify the transition function for an NFA as ell as for a
DFA( The only difference is that each entry in the table for the NFA is a set. e"en if the set is a singleton<has
one member=( >hen there is no transition at all from a gi"en state on a gi"en input symbol. the proper entry
is . the empty set(
T+, E7),/9,9 T!*/81)1"/ F(/c)1"/:
>e need to extend the transition function of an NFA to a function that ta*es a state # and a string of input
symbols . and returns the set of states that the NFA is in if it starts in state # and processes the string (
<#, = is the column of states found after reading . if # is the lone state in the first column( Formally e
define for an NFA/s transition function by+
BASIS <#. = B C#E( That is. ithout reading any input symbols. e are in the state e began
in(
INDUCTION Suppose : is of the form B xa. here a is the final symbol of and x is the rest of
( Also suppose that <#, x= B Cp
A
. p
F
. GG( p
*
E( Let
*
<p
i
, a= B Cr
A
. r
F
.GGGG(r
m
E
iBA
Then <#, = BCr
A
. r
F
.GGGG(r
m
E( Less formally. e compute <#,= by first computing <#, x=. and by
then folloing any transition from any of these states that is labeled a(
E7*#23,
Let us use to describe the processing of input DDADA by the NFA of fig<a=( A summary of the steps is+
<#
D
, = B C#
D
E
<#
D
, D= B <#
D
, D= B C #
D
. #
A
E
<#
D
, DD= B <#
D
, D= <#
A.
D= B C#
D
. #
A
EBC#
D
. #
A
E
<#
D
, DDA= B <#
D
, A= <#
A.
A= B C#
D
EC#
F
EBC#
D
. #
F
E
<#
D
, DDAD= B <#
D
, D= <#
A.
D= B C#
D
. #
A
EBC#
D
. #
A
E
<#
D
, DDADA= B <#
D
, D= <#
A.
A= B C#
D
EC#
F
EBC#
D
. #
F
E
Line<A= is the basis rule( >e obtain line<F= by applying to the state. #
D
. that is in the pre"ious set. and get
C#
D
. #
A
E as a result( Line<H= is obtained by ta*ing the union o"er the to states in the pre"ious set of hat e
get hen e apply to them ith input D( That is. <#
D
, D=BC#
D
. #
A
E. hile <#
D
, D= B ( For line<N=. e ta*e
the union of <#
D
, A= B C#
D
E and <#
D
, A= BC#
F
E( Lines<I= and <Q= are similar to lines<H= and <N=(
T+, L*/6(*6,8 "> */ NFA:
An NFA accepts a string if it is possible to ma*e any se#uence of choices of next state. hile reading the
characters of . and go from the start state to any accepting state( The fact that other choices using the input
symbols of lead to a non accepting state. or do not lead to any state at all<i(e(. the se#uence of states ,dies-
=. does not pre"ent from being accepted by the NFA as a hole( Formally. if A B < M. . . #
D
. F= is an
NFA. then
L<A= B C0 <#
D
. = F E
That is. L<A= is the set of strings in : such that <#
D
. = contains at least one accepting state(
G1A, * <!1,> /"), "/ )+, ,F(1A*3,/c, "> D,),!#1/18)1c */9 N"/-9,),!#1/18)1c F1/1), A()"#*)*? ."!0
=+*) 9" -"( #,*/) <- ,F(1A*3,/c,? E723*1/ ,F(1A*3,/c, "> DFA */9 NFA. .10#*!@80
!"ery language that can be described by some NFA can also be described by some DFA( The DFA has about
as many states as the NFA( Although it often has more transitions( In the orst case. hoe"er. the smallest
DFA can ha"e F
n
states hile the smallest NFA for the same language has only n states(
The proof that DFA/s can do hate"er NFA/s can do in"ol"es an important ,construction- called the subset
construction because it in"ol"es constructing all subsets of the set of states of the NFA(
The subset construction starts from an NFA N B< M
N
. .
N
. #
D
. F
N
=( Its goal is the description of a DFA D B
<M
D
. .
D
. C#
D
E. F
D
= such that L<D= B L<N=( The input alphabets of the to automata are the same. and the
start state of D is the set containing only the start state of N( The other components of D are constructed as
follos(

M
D
is the set of subsets of M
N
1 i(e(. M
D
is the poer set of M
N
( Note that if M
N
has n states. then M
D
ill ha"e F
n
states( Note all these states are accessible from the start state of M
(
Inaccessible states
can be ,thron aay-. so effecti"ely. the number of states of D may much smaller than F
n
(

F
D
is the set of subsets S of M
N
such that S F
N
( That is. F
D
is all sets of N/s states that
include at least one accepting state of N(

For each set S M


N
and for each input symbol a in .

D
<S. a= B
N
<p. a=

p in S
That is. to compute
D
<S. a= e loo* at all the states p in S. see hat states N goes from p on input a. and
ta*e the union of all those states(
E7*#23,
Let N be the automaton of the abo"e figure that accepts all strings that end in DA( Since N/s set of states is
C#
D
. #
A
.#
F
E. the subset construction produces a DFA ith F
H
B R states. corresponding to all the subsets of
these three states( The folloing figure shos the transition table for these eight states(
This transition table belongs to a deterministic finite automaton( !"en though the entries in the table
are sets. the states of the constructed DFA are sets( >e can in"ent ne names for these states eg(. A for .
P for C#
D
E etc(. The DFA transition table defines exactly the same automaton but ma*es clear the point that
the entries in the table are single states of the DFA(
0 1
A A A
B E B
C A D
G D A A
E E F
G F E B
G G A D
G H ! F
)enaming the States
Lf the right states in table-F starting in the start state P. e can only reach states P. ! and F( The other fi"e
states are inaccessible from the start state( Those states are remo"ed and the table is reconstructed and the
DFA diagram is dran ith that table as shon in the figure(
0 1

DF
0
E C#
D
. #
A
E C#
D
E
DF
1
E

C#
F
E
GDF
4
E
DF
0
, F
1
E C#
D
. #
A
E C#
D
. #
F
E
GDF
0
, F
4
E C#
D
. #
A
E C#
D
E
GDF
1
, F
4
E

C#
F
E
GDF
0
, F
1
, F
4
E C#
D
. #
A
E C#
D
. #
F
E
Subset &onstruction from NFA
#
D
#
D
#
D
Start D
A
D. A
E723*1/ +": )" >1/9 8)!1/6 1/ * ),7)? ."!0
E723*1/ )+, 8),28 1/A"3A,9 1/ >1/91/6 * 8)!1/6 1/ * ),7)? .10 #*!@80
A common problem in the age of the eb and other on-line text repositories is the folloing( 'i"en a set of
ords. find all documents that contain one of those ords( The search engine uses a particular technology.
called in"erted indexes. here for each ord appearing on the eb a list of all the places here that ord
occurs is stored( %achines ith "ery large amounts of main memory *eep the most common of these lists
a"ailable. alloing many people to search for documents at once(
In"ented-index techni#ues do not ma*e use of finite automata. but they also ta*e "ery large amounts of main
memory *eep the most common of these lists a"ailable. alloing many people to search for documents at
once( In"erted-index techni#ues do not ma*e use of finite automata. but they also ta*e "ery large amounts of
time for cralers to copy the eb and set up the indexes( There are a number of related applications that are
unsuited for in"erted indexes. but are good applications for automaton-based techni#ues( The characteristics
that ma*e an application suitable for searches that use automata are+
A( The repository on hich the search is conducted is rapidly changing( For eg+
a= !"eryday. nes analysts ant to search the day/s on-line nes articles for rele"ant topics( For
eg(. a financial analyst might search for certain stoc* tic*er symbols or names of companies(
b= A ,shopping robot- ants to search for the current prices charged for the items that its clients
re#uest( The robot ill retrie"e current catalog pages for the eb and then search those pages
for ords that suggest a price for a particular item(
F( The documents to be searched cannot be cataloged( For eg(. Ama$on(com does not ma*e it easy for
cralers to find all the pages for all the boo*s that the company sells( )ather. these pages are
generated ,on the fly- in response to #ueries( 2oe"er. e could send a #uery for boo*s on a certain
topic. say ,finite automata-. and then search the pages retrie"ed for certain ords. eg(. ,excellent- in
a re"ie portion(
C"/8)!(c) */ NFA >"! ),7) 8,*!c+? ."!0
G1A, * 8+"!) /"), "/ ),7) 8,*!c+ >"! /"/ 9,),!#1/18)1c >1/1), *()"#*)*?.5 #*!@80
Suppose e are gi"en a set of ords. hich e shall call the *eyords. and e ant to find occurrences of
any these ords( In applications such as these. a useful ay to proceed is to design a non-deterministic finite
automaton. hich signals. by entering an accepting state. that it has seen one of the *eyord( The text of a
document is fed. one character at a time to this NFA. hich then recogni$es occurrences of the *eyords in
this text( There is a simple form to an NFA that recogni$es a set of *eyords(
A= There is a start state ith a transition to itself on e"ery input symbol. e(g(. e"ery printable AS&II
character if e are examining text( Intuiti"ely. the start state represents a ,guess- that e ha"e
not yet begun to see one of the *eyords. e"en if e ha"e seen some letters of one of these
ords(
F= For each *eyord a
A
a
F
GGGa
*
. there are * states. say #
A
#
F
. GG((#
*
( There is a transition from
the start state to #
A
on symbol a
A
. a transition from #
A
to #
F
on symbol a
F.
and so on( The state #
*
is
an accepting state and indicates that the *eyord a
A
a
FGGG(
a
*
has been found(
E6., suppose e ant to design an NFA to recogni$e occurrences of the ords eb and ebay( The transition
diagram for the NFA designed using the rule abo"e is in fig(F(AQ( State A is the start state. and e use to
C#
D
.
#
A
E
C#
D
.
#
F
E
C#
D
E
Start
D A
A D
A
D
stand for the set of all printable AS&II characters( States F through N ha"e the Kob of recogni$ing eb. hile
states I through R recogni$e ebay(
Lf course the NFA is not a program( >e ha"e to maKor choices for an implementation of this NFA(
A( >rite a program that simulates this NFA by computing the set of states it is in after reading each
input symbol( The simulation as suggested in fig<b=(
F( &on"ert the NFA to an e#ui"alent DFA using the subset construction( Then simulate the DFA
directly(
Some text processing programs. such as ad"anced form of the 4NI5 grep command<egrep and fgrep=
actually use a mixture of these to approaches( 2oe"er for our purpose con"ersion to a DFA is easy and is
guaranteed not to increase the number of states(
E723*1/ )+, c"/A,!81"/ "> NFA )" DFA (81/6 ),7) 8,*!c+? ."!0
G1A, 8+"!) /"),8 "/ c"/A,!81"/ "> NFA >!"# DFA >"!# "> ),7) 8,*!c+,8? .5 #*!@80
>hen e apply the subset construction to an NFA that as designed from a set of *eyords. e find that the
number of states of the DFA is ne"er greater than the number of states of the NFA( The rules for constructing
the set of NFA states is as follos.
a= If #
D
is the start state of the NFA. then C#
D
E is one of the states of the DFA(
b= Suppose p is one of the NFA states. and it is reached from the start state along a path hose symbols
are a
A
a
F
GG((a
m
( Then one of the DFA states is the set of NFA consisting of+
A(
#
D
F(
p
H(
!"ery other state of the NFA that is reachable from #
D
by folloing a path hose labels are a
suffix of a
A
a
F
G((a
m
. that is. any se#uence of symbols of the form a
K
a
K3A
GGGGG((a
m
(
Fig <c=( &on"ersion of the NFA to a DFA
There ill be one DFA state for each NFA state p( 2oe"er. in step <b=. to states may actually yield the
same set of NFA states. and thus become one state of the DFA( For eg(. if to of the *eyords begin ith
the same letter. say a. then the to NFA states that are reached from #
D
by an arc labeled a ill yield the
same set of NFA states and thus get merged in the DFA(
G1A, 8+"!) /"),8 "/ F1/1), A()"#*)* :1)+ E2813"/ T!*/81)1"/8 ."!0
E723*1/ *<"() E2813"/ T!*/81)1"/8 "/ F1/1), A()"#*)* :1)+ ,7*#23,? .10 #*!@80
The ne ,feature- is that e allo a transition on . the empty string( In effect. an NFA is alloed to ma*e a
transition spontaneously ithout recei"ing an input symbol(
U8,8 "> -T!*/81)1"/8:
>e shall begin ith an informal treatment of -NFA/s. using transition diagrams ith alloed as a label( In
the examples to follo. thin* of the automaton as accepting those se#uences of labels along paths from the
start state to an accepting state( 2oe"er. each along a path is ,in"isible-1 i(e(. it contributes nothing to the
string along the path(
E7*#23,
The folloing figure is an -NFA that accepts decimal numbers consisting of+
A( An optional 3 or O sign(
F( A string of digits(
H( A decimal point. and
N( Another string of digits( !ither this string of digits. or the string<F= can be empty. but at least one
of the to strings of digits must be nonempty(
Lf particular interest is the transition from #
D
to #
A
on any of . 3 or -(
Thus. state #
A
represents the situation in hich e ha"e seen the sign if there is one. but none of the
digits are decimal point(
State #
F
represents the situation here e ha"e Kust seen the decimal point. and may or may not ha"e
seen prior digits(
In #
N
e ha"e definitely seen at least one digit. but not the decimal point(
Thus. the interpretation of #
H
is that e ha"e seen a decimal point and at least one digit. either before
or after the decimal point(
T+, >"!#*3 N")*)1"/ >"! */ -NFA:
>e may represent an -NFA exactly as e do an NFA. ith one exception+ the transition function must
include information about transitions on ( Formally. e represent an -NFA A by A B <M. . . #
D
. F=. here
all components ha"e their same interpretation as for NFA. except that is no a function that ta*es as
arguments+
A( A state in M. and
F( A member of CE. that is. either an input symbol. or the symbol ( >e re#uire that . the symbol
for the empty string. cannot be a member of the alphabet . so no confusion results(
E7*#23,
The abo"e -NFA is formally represented as
! B<C#
D
. #
A
. GG(#
I
E. C.. 3. - .D. AG(SE. . #
D .
C#
I
E=
>here is defined by the transition table in figure
H,- . 0,1,I5
F
0
C#
A
E C#
A
E

F
1 C#
F
E C#
A.
#
N
E
F
4
C#
H
E
F
3
C#
I
E C#
H
E
F
&
C#
H
E

F
5

E2813"/ C3"8(!,8
-closure of a state is defined as finding e"ery state that can be reached from the current state along any path
hose arcs are all labeled ( Formal definition is as follos+
BASIS State # is in !&LLS!<#=
INDUCTION If state p is in !&LLS!<#=.and there is a transition from state p to state r lebeled .
then r is in !&LLS!<#=( %ore precisely. if is the transition function of the -NFA
in"ol"ed. and p is in !&LLS!<#=. then !&LLS!<#= also contains all the states in <p.
=(
E7*#23,
&onsider the folloing figure1 each state is its on -closure. ith to exceptions
!&LLS! <#
D
= B C#
D
. #
A
E
and !&LLS! <#
H
EBC#
H
. #
I
E
The reason is that there are only to -transitions. one that adds #
A
to !&LLS!<#
D
= and the other that adds #
I
to !&LLS!<#
H
=(
For this collection of states. hich may be part of some -NFA. e can conclude that
!&LLS!<A= B CA.F.H.N.QE
!ach of these states can be reached from state A along a path exclusi"ely labeled (
For eg(. state Q is reached by the path AFHQ( State T is not in !&LLS! <A=( Since. although it is
reachable from stateA. the path must use the arc NI that is not labeled(
The fact that state Q is also reached from state A along a path ANIQ that has non- transitions is
unimportant(
The existence of one path ith all labels is sufficient to sho state Q is in !&LLS! <A=(
=!1), 8+"!) /"),8 "/ )+, ,7),/9,9 )!*/81)1"/8 */9 3*/6(*6,8 >"! -NFAB8? ."!0
E723*1/ +": )" ,7),/9 )+, )!*/81)1"/8 */9 3*/6(*6,8 >"! -NFAB8 .10 #*!@80
The -closure allos us to explain easily hat the transitions of an -NFA loo* li*e hen gi"en a se#uence
of inputs( >e can define hat it means for an -NFA to accept its input(
Suppose that !B<M. . . #
D
. F= is an -NFA( >e first define . the extended transition function. to
reflect hat happens on a se#uence of inputs(
The intent is that <#.= is the set of states that can be reached along a path hose labels. hen
concatenated. form the string (
/s along this path do not contribute to (
BASIS <#.= B !&LLS!<#=( That is. if the label of the path is ( Then e can follo only -
labeled arcs extending from state #1 that is exactly hat !&LLS! does(
INDUCTION Suppose is of the form xa. here a is the last symbol of ( Note that a is a member
of 1 it cannot be . hich is not in ( >e compute <#. = as follos+
A( Let Cp
A
.p
F
.GG(p
*
E be <#.x=( That is. the p
i
/s are the states that e can reach from # folloing a path
labeled x( This path may end ith transitions labeled . and may ha"e other -transitions as ell(
F( Let
k
i A =

< p
i
.a= be the set Cr
A
.r
F
.GGG(r
m
E(That is. follo all transitions labeled a from states e
can
reach from # along paths labeled x( The r
K
/s by folloing -labeled arcs in step<H= belo(
H( Then <#.= B
*
!&LLS!<r
K
=(This closure step includes all the paths from # labeled . by
considering the possibility that there are additional -labeled arcs that can follo after ma*ing a
transition on the final ,real- symbol. a(
E7*#23,
Let us compute <#
D
. I(Q= for the abo"e -NFA <page no(AH=( A summary of the steps needed are as follos+
< #. = B !&LLS!<#
D
= B C#
D
.#
A
E
compute <#
D
. I= as follos+
A( First compute the transitions on input I from the states #
D
and #
A
that e obtained in
the calculation of <#
D
.=. abo"e( That is. e compute <#
D
.I= <#
A
.I= B C#
A
.#
N
E
F( Next. -closure the members of the set computed in step<A=(
H( >e get !&LLS!<#
A
= !&LLS!<#
N
= B C#
A
EC#
N
E B C#
A
.#
N
E(That set is <#
D
. I=( This
to-step pattern repeats for the next to symbols(
&ompute <#
D
. I.= as follos+
A( First compute < #
D
. .= < #
N
. .= B C#
F
EC#
H
E B C#
F
.#
H
E
F( Then compute < #
D
. I.= B !&LLS!<#
F
= !&LLS!<#
H
= B C#
F
EC#
H
.#
I
EB C#
F
.#
H
.#
I
E
&ompute < #
D
. I.Q= as follos+
A( First compute < #
F
. Q= < #
H
.Q= < #
I
.Q= B C#
H
EC#
H
E BC#
H
E
F( Then compute <#
D
. I(Q= B !&LLS!<#
H
= B C#
H
.#
I
E
E723*1/ +": )" ,31#1/*), -)!*/81)1"/8 :1)+ */ ,7*#23,. ."!0
G1A, * <!1,> /"), "/ ,31#1/*)1"/ "> -)!*/81)1"/8? .5 M*!@80
'i"en any -NFA. e can find a DFA D that accepts the same language as !( The construction e use is
"ery close to the subset construction. as the states of D are subsets of the states of !( The only difference is
that e must incorporate -transitions of !. hich e do through the mechanism of the -closure(
Let ! B <M
D
. .
!
. #
D
. F
!
=( Then the e#ui"alent DFA
DB <M
D
. .
D
. #
D
. F
D
= is defined as follos+-
A( M
D
is the set of subsets of M
!
( %ore precisely. e shall find that the only accessible states of D are
the -closed subsets of M
!
. that is. those sets SM
!
such that SB!&LLS!<S=(
F( #
D
B !&LLS!<#
D
=1 e get the start state of D by closing the set consisting of only the start state of !(
H( F
D
id those sets of states that contain at least one accepting state of !( That is. F
D
BCS0S is in M
D
and
SF
!
E(
N(
D
<S.a= is computed. for all a in and sets S in M
D
by+
a( Let S BC p
A
.p
F
GGp
*
=
b( &ompute
k
i A =

<p
i
. a=1 let this set be Cr
A
.r
F
.G((r
m
E

c( Then
D
<S.a= B
k
j A =

!&LLS!<r
K
=

Let us eliminate -transitions from the -NFA of the abo"e figure. hich e shall call ! in hat follos(
From !. e construct an DFA D. hich is shon in the folloing figure( Imagine that for each state shon
in fig(F(FF there are additional transitions from any state to on any input symbols for hich a transition is
not indicated( Also. the state has transitions to itself on all input symbols D.A.G((S(

Since the start state of ! is #
D
. the start state of D is !&LLS!<#
D
=. hich is C#
D
. #
A
=(
Lur first Kob is to find the successors of #
D
and #
A
on the "arious symbols in 1 note that these symbols
are the plus and minus signs. the dot. and the digits D through S(
Ln 3 and -. #
A
goes nohere in the NFA. hile #
D
goes to #
A
(
Thus. to compute
D
<C#
D
. #
A
E. 3= e start ith C#
A
E and -close it(
Since there are no -transitions out of #
A
. e ha"e
D
<C#
D
. #
A
E. 3= B C#
A
E( Similarly.
D
<C#
D
. #
A
E. -= B C#
A
E(
These to transitions are shon by one arc in the figure(
Next. e need to compute
D
<C#
D
. #
A
E. .=( Since #
D
goes nohere on the dot. and #
A
goes to #
F
in the NFA
e must -close C#
F
E(
As there are no -transitions out of #
F
. this state is its on closure. so
D
<C#
D
. #
A
E. .= BC#
F
E(
Finally. e must compute
D
<C#
D
. #
A
E. DE. as an example of the transitions from C#
D
.#
A
E on all the digits(
>e find that #
D
goes nohere on the digits. but #
A
goes nohere on the digits. but #
A
goes to both #
A
and
#
N
( Since neither of those states ha"e -transitions out. e conclude
D
<C#
D
. #
A
E. D= BC#
A.
#
N
Eand li*eise
for the other digits(
Since #
I
is the only accepting state of !. the accepting states of D are those accessible states contains #
I
(
>e see these to sets C#
H
. #
I
E and C#
F
. #
H
. #
I
E indicated by double circles in figure(
E/9 "> U/1) -1
U N I T - II
D,>1/, !,6(3*! ,72!,881"/ :1)+ ,7*#23,. ."!0
G1A, * <!1,> /"), "/ !,6(3*! ,72!,881"/8. .10
#*!@80
The algebraic description of the language is called as ,regular expressions-( )egular expressions offer a
declarati"e ay to express the strings e ant to accept( Thus. regular expressions ser"e as the input
language for many systems that process strings( !xamples include+
Search commands
Lexical analy$er generators
T+, "2,!*)"!8 "> R,6(3*! E72!,881"/8:
)egular expressions denote languages( For a simple example. the regular expression DA: 3AD
:
denotes the
language consisting of all strings that are either a single D folloed by any number of A/s or a single A
folloed by any number of D/s( Pefore describing the regular-expression notation. e need to learn the three
operations on languages that the operators of regular expressions represent( These operations are+
A( The (/1"/ of to languages L and %. denoted L %. is the set of strings that are in either L or
%. or both( For eg(. if LBCDDA.AD.AAAE and %BC.DDAE. then L% B C.AD.DDA.AAAE(
F( The c"/c*),/*)1"/ of languages L and % is the set of strings in L and concatenating it ith any
string in %( This operator is denoted either ith a dot or ith no operator at all(
For eg(. if LBCDDA.AD.AAAE and %BC.DDAE. then
L(%. or Kust L% is CDDA.AD.AAA.DDADDA.ADDDA.AAADDAE(
The first three strings in L% are the strings in L concatenated ith ( Since is the identity for
concatenation. the resulting strings are the same as the strings of L( 2oe"er. the last three strings
in L% are formed by ta*ing each string in L and concatenating it ith the second string in %.
hich is it ith the second string in %. hich is DDA(For instance. AD from L concatenated ith
DDA from % gi"es us ADDDA for L%(
H( The c3"8(!, of a language L is denoted L
:
and represents the set of those strings that can be
formed by ta*ing any number of strings from L. possibly ith repetitions and concatenating all of
them( For instance. if LBCD.AE. then L
:
is all strings of D/s and A/s( If LBCD.AAE. then L
:
consists of
those strings of D/s and A/s such that the A/s come in pairs( e(g(. DAA.AAAAD and but not DADAA or
ADA(
B(1391/6 R,6(3*! E72!,881"/8:
Algebras of all *inds start ith some elementary expressions. usually constants or "ariables( Algebras then
allo us to construct more expressions by applying a certain set of operators to these elementary expressions
and to pre"iously constructed expressions( >e can describe the regular expressions recursi"ely. as follos(
In this definition e not only describe hat the legal regular expressions are. but for each regular expression
!. e describe the language it represents. hich e denote L<!=(
BASIS
The basis consists of three parts(
A( The constants and are regular expressions. denoting the languages CE and respecti"ely( That
is. L<= B CE. and L<= B
F( If 6a/ is any symbol. then 6a/ is a regular expression( This expression denotes the language CaE(
That is. L<a= BCaE(
H( A "ariable usually capitali$ed and italic such as L. is a "ariable. representing any language(
INDUCTION
There are four parts to the inducti"e step. one for each of the three operators and one for the introduction of
parentheses(
A( If ! and F are regular expressions. then !3F is a regular expression denoting the union of L<!= and
L<F=( That is. L<!3F= B L<!=L<F=(
F( If ! and F are regular expressions then !F is a regular expression denoting the concatenation of
L<!= and L<F=( That is L<!F= B L<!=L<F=(
H( If ! is a regular expression. then !
:
is a regular expression. denoting the closure of L<!=( That is.
L<!:= B <L<!==:(
N( If ! is a regular expression. then <!=. a parenthesi$ed !. is also a regular expression. denoting the
same language as !( Formally. L<<!==BL<!=(
!,c,9,/c, "> R,6(3*!-E72!,881"/ "2,!*)"!8:
Li*e other algebras. the regular expression operators ha"e an assumed order of ,precedence- hich means
that operators are associated ith their operands in particular order( For instance. e *no that xy3$ groups
the product xy before the sum. so it is e#ui"alent to the parameteri$ed expression <xy=3$ and not to the
expression x<y3$=( For regular expressions. the folloing is the order of precedence for the operators(
A( The star <closure= operator is of highest precedence(
F( Next in precedence comes the concatenation or ,dot- operator( After grouping all stars to their
operands. e group concatenation operators to their operands( &oncatenation is an associati"e
operator. e"aluates the expression from the left( For instance. DAF is grouped <DA=F(
H( Finally. all unions <3 operators= are grouped ith their operands( Since union is also associati"e. it
is also e"aluates from the left(
E723*1/ >1/1), *()"#*)* :1)+ !,6(3*! ,72!,881"/8? ."!0
D,8c!1<, >1/1), *()"#*)* */9 !,6(3*! ,72!,881"/8? .10 M*!@80
>hile the regular-expression approach to describe languages is fundamentally different from the finite-
automaton approach. these to notations turn out to represent exactly the same set of languages. hich e
termed the ,regular languages-( In order to sho that the regular expressions define the same class. e must
sho that+
A( !"ery language defined by one of these automata is also defined by a regular expression( For this
proof. e assume the language is accepted by some DFA(
F( !"ery language defined by a regular expression is defined by one of these automata( For this part
of the proof. the easiest is to sho that there is an NFA ith -transitions accepting the same
language(
The abo"e figure shos all the e#ui"alences( An arc from class 5 to class ; means that e pro"e e"ery
language defined by class 5 is also defined by class ;( Since the graph is strongly connected e see that all
four classes are really the same(
F!"# DFAB8 )" R,6(3*! E72!,881"/8:
>e build expressions that describe sets of strings that label certain paths in the DFA/s transition diagram(
2oe"er. the paths are alloed to pass through only a limited subset of the states( In an inducti"e definition
of these expressions. e start ith the simplest expressions that describe paths that are not alloed to go
through any state1 i(e(. the expressions e generate at the end represent all possible paths(
E7*#23,: Let us con"ert the DFA to a regular expression( This DFA accepts all strings that ha"e at least one
D in them( To see hy. note that the automaton goes from the start state A to accepting state F as soon as it
sees an input D( The automaton then stays in state F on all input se#uences(

from the DFA. the basis expressions can be constructed as shon in the table(
For instance.
R
= D <
AA
has the term because the beginning and ending states are the same. state A( It
has the term A because there is an arc from state A to state A on input A(
As another example.
R
= D <
AF
is D because there is an arc labeled D from state A to state F( There is no
term because the beginning and ending states are different(
For a third example.
R
= D <
FA
B . because there is no arc from state F to state A(
No. construct the expressions by considering the state A( The rule for computing the expressions
Rij
= A <
are
Rij
= A <
B
Rij
= D <
3
Ri
= D <
A

R
= D <
AA
:
R j
= D <
A
The folloing table gi"es first the expressions computed by direct substitution into the abo"e formula. and
then a simplified expression that e can sho to represent the same language as the more complex
expression(
To understand the simplification. note the general principle that
if ) is any regular expression. then < 3)=
:
B )
:
(
In our case. e ha"e <3A=
:
BA
:
1 both expressions denote any number of A/s(
Also that <3A=A
:
B A
:
.denotes any number of A/s(
Thus. the original expression
R
= A <
AF
is e#ui"alent to D3A:D( This expression denotes the language
containing the string D and all strings consisting of a D preceded by any number of A/s( This language
is also expressed by the simpler expression A:D(
For any regular expression )+ ) B ) B( And 3) B )3 B)( That is. is the identity for union1
it results in the other expression hene"er it appears in a union(
Let us compute the expressions
Rij
= F <
( The inducti"e rule applied ith *BF gi"e us+
Rij
= F <
B
Rij
= A <
3
Ri
= A <
F

R
= A <
FF
:
R j
= A <
F
The expressions are gi"en in the folloing table(
The final regular expression e#ui"alent to the DFA is constructed by ta*ing the union of all the expressions
here the first state is the start state and the second state is accepting( In this eg(. ith A as the start state and
F as the only accepting state. e need only the expression
R
= A <
AF
(
This expression is 1
G
0.0H10
G
(
It is simple to interpret this expression( Its language consists of all strings that begin ith $ero or more A/s.
then ha"e a D. and then any string of D/s and A/s( In another ay. the language is all strings of D/s and A/s
ith at least one D(
E723*1/ )+, c"/A,!81"/ "> DFA )" !,6(3*! ,72!,881"/. ."!0 .10
#*!@80
D,8c!1<, )+, c"/c,2)8 "> ,31#1/*)1/6 8)*),8 1/ )+, c"/A,!81"/ "> DFA )" R,6(3*! ,72!,881"/?
The regular expressions can be constructed by eliminating the states of DFA(
>hen a state is eliminated. all the paths that ent through s no longer exist in the automaton(
Lnce a state is eliminated. the language ill also change( So an arc must be included to connect its
predecessor and successor(
This arc ill contain a string as a label(
)egular expression is used to represent the string(
Finally the language of the automaton is the union o"er all paths from the start state to an accepting
state of the language formed by concatenating the languages of the regular expressions along that
path(
The folloing figure<a= shos a generic state s about to be eliminated( The state s has predecessor states
#
A
.#
F
.GG(.#
*
and successor states p
A
.p
F.
GG(p
m
(
<a= <b=
Fig(<b= shos hat happens hen e eliminate state s(
All arcs in"ol"ing state s are deleted(
To compensate. e introduce. for each predecessor #
i
of s and each successor p
K
of s. a regular
expression that represents all the paths that start at #
i
. go to s. perhaps loop around s $ero or more
times(
The expression for these paths is M
i
s: P
K
( This expression is added to the arc from #
i
to p
K
(
If there as no arc #
i
p
K
. then first introduce one ith regular expression (
The strategy for constructing a regular expression from a finite automaton is as follos+
A( For each accepting state #. apply the abo"e reduction process to produce an e#ui"alent automaton
ith regular expression labels on the arcs(
F( !liminate all states except # and the start state #
D
(
H( If # U #
D
. then e shall be left ith a to-state automaton that loo*s li*e as in the figure(
N( The regular expression for the accepted strings can be described in "arious ays(
I( Lne is <)3S4:T=: S4:(
a( In explanation. e can go from the start state to itself any number of times. by folloing a
se#uence of paths hose labels are in either L<)= or L<S4:T=(
b( The expression S4:T represents paths that go to the accepting state "ia a path in L<S=.
perhaps return to the accepting state se"eral times using a se#uence of paths ith labels in
L<4=. and then return to the start state ith a path hose label is in L<T=(
c( Then e must go to the accepting state. ne"er to return to the start state. by folloing a
path ith a label in L<S=(
d( Lnce in the accepting state. e can return to it as many times as e li*e. by folloing a
path hose label is in L<4=(
Q( If the start state is also an accepting state. then e must also perform a state-elimination from the
original automaton that gets rid of e"ery state but the start state (>hen e do so. e are left ith a
one-state automaton that loo*s li*e in the folloing figure( The regular expression is )
:
(
T( The desired regular expression is the sum of all the expressions deri"ed from the reduced automata
for each accepting state. by rules <F= and <H=(
E6: Let us consider the NFA in the abo"e figure that accepts all strings of D/s and A/s such that either the
second or third position from the end has a A( Lur first step is to con"ert it to an automaton ith regular
expression labels( Since no state elimination has been performed. all e ha"e to do is replace the labels
,D.A- ith the e#ui"alent regular expression D3A( The result is shon in the folloing figure(
Let us first eliminate state P( Since this state is neither accepting nor the start state. it ill not be in any of
the reduced automata( State P has one predecessor. A. and one successor. &(
As a result. the expression on the ne arc from A to & is 3A
:
<D3A=(
To simplify. e first eliminate the initial . hich any be ignored in a union( The expression thus
becomes A
:
<D3A=( Note that the regular expression
:
is e#ui"alent to the regular expression (
Thus. A
:
<D3A= is e#ui"alent to A<D3A=(
No. e must branch. eliminating states & and D in separate reductions( To eliminate state &. and the
resulting automaton is as shon in the figure(
In terms of the generic to-state automaton. the regular expressions are+ )B D3A. SBA<D3A=<D3A=.
TB. and 4B( The expression 4
:
can be replaced by (
Also. the expression S4
:
T is e#ui"alent to (
The generic expression <)3S4
:
T=
:
S4
:
thus simplifies in this case to )
:
S. or .0H10
G
1.0H10.0H10( In
informal terms. the language of this expression is any string ending in A. folloed by to symbols
that are each either D or A(
That language is one portion of the strings accepted by the original automaton( those strings hose
third position from the end has a A(
No e start again and eliminate state D instead of &( since D has no successors( The resulting to-state
automaton is shon in the figure(
Thus. e can apply the rule for to-state automata and simplify the expression to get .0H10
G
1.0H10( This
expression represents the other type of string the automaton accepts+ those ith a A in the second position
from the end( All that remains is to sum the to expressions to get the expression for the entire automaton(
This expression is
.0H10
G
1.0H10 H .0H10
G
1.0H10.0H10
D,8c!1<, )+, c"/A,!81"/ "> !,6(3*! ,72!,881"/ 1/)" *()"#*)*? ."!0
E723*1/ )+, c"/A,!81"/ "> !,6(3*! ,72!,881"/ 1/)" >1/1), *()"#*)*? .10 #*!@80
The automata can be constructed for single symbols . and ( And they can be combined into larger
automata that accept the union. concatenation. or closure of the language accepted by smaller automata( All
of the automata e construct are -NFA/s ith a single accepting state(
T+,"!,#: !"ery language defined by a regular expression is also defined by a
finite automaton(
!"">: Suppose LBL< ) = for a regular expression )( >e sho that LBL<!= for
some -NFA ! ith+
A( !xactly one accepting state
F( No arcs into the initial state
H( No arcs out of the accepting state
BASIS
There are three parts to the basis as shon in the figure(
Part<a= shos to handle the expression ( The language of the automaton is easily seen to be CE.
since the only path from the start state to an accepting state is labeled (
Part <b= shos the construction for ( &learly there are no paths from start state to accepting state. so
is the language of this automaton(
Finally. part <c= gi"es the automaton for a regular expression a( the language of this automaton
consists of the one string a. hich is also L<a=( It is easy to chec* that these automata all satisfy
conditions <A=. <F= and <H= of the inducti"e hypothesis(

INDUCTION
The three parts of the induction are shon in the figure( It is assumed that the statement of the theorem is
true for the immediate sub expressions of a gi"en regular expression1 that is. the languages of these sub
expressions are also the languages of -NFA/s ith a single accepting state( The four cases are+
The expression id )3S for some smaller
expressions ) and S( Then the automaton of fig<a=
ser"es( That is. starting at the ne start state. e can
go to the start state of either ) or S( >e then reach
the accepting state of one of these automata.
folloing a path labeled by some string in L<)= or
L<S=( Lnce e reach the accepting state of the
automaton for ) or S. e can follo one of the -
arcs to the accepting state of the ne automaton(
Thus. the language of the automaton in fig<a= shos
L<)=L<S=(
The expression is )S for some smaller expression )
and S( The automaton for the concatenation is
shon in fig<b=( Note that the start state of the first
automaton becomes the start state of the hole. and
the accepting state of the second automaton
becomes the accepting state of the hole( The idea
is that the only paths from the start state to
accepting state go first through the automaton for ). here it must follo a path labeled by a string
in L<)=. and then through the automaton for S. here it follos a path labeled by a string in L<S=(
Thus. the paths in the automaton of fig<b= are all and only those labeled by strings in L<)= L<S=(
The expression is )
:
for some smaller expression )( Then e use the automaton of fig<c=( That
automaton allos us to go either(
A( Directly from the start state to the accepting state along a path labeled hich is in L<)=: no
matter hat ) is(
F( To the start state of the automaton for ) through that automaton one or more times. and then
to the accepting state( This set of paths allos us to accept in L<)=. L<)= L<)=. L<)= L<)=
L<)=. and so on. thus co"ering all strings in L<)=: except hich as co"ered by direct arc
to the accepting state mentioned in H<a=(
The expression is <)= for some smaller expression )( The automaton of ) also ser"es as the
automaton for <)= since the parenthesis do not change the language defined by the expression(
C"/c3(81"/+ It is a simple obser"ation that the constructed automata satisfy the three conditions gi"en in the
inducti"e hypothesis( Lne accepting
state. ith no arcs into the initial
state or out of the accepting state(
E7*#23,
Let us con"ert the regular expression
<D3A=
:
A<D3A= to an -NFA(
Lur first step is to construct
an automaton for D3A(
Next. e apply closure
operation(
The third automaton in the
concatenation is another
automaton for D3A(
The step by step construction
is shon in the figure(

S)*), )+, *36,<!*1c 3*:8 >"! !,6(3*! ,72!,881"/8. ."!0
E723*1/ )+, A36,<!*1c 3*:8 *A*13*<3, 1/ !,6(3*! ,72!,881"/8. .10 #*!@80
Li*e arithmetic expressions. the regular expressions ha"e a number of las that or* for them( %any of
these are similar to the las for arithmetic. if e thin* of union as addition and concatenation as
multiplication(
C"##()*)1A1)-
&ommutati"ity is the property of here the order of its operands can be sitched and gets the same result(
An example for arithmetic is gi"en as x3y B y3x(
L3% B %3L( This la. the commutati"e la for union. says that e may ta*e the union of to
languages in either order(
A88"c1*)1A1)-
Associati"ity is the property of an operator that allos us to group the operands hen the operator is applied
tice( For eg(. the associati"e la of multiplication is <x y= $ B x <y $=(
<L3%=3N B L3<%3N=( The union of three languages either by ta*ing the union of the last to
initially. or ta*ing the union of the last to initially(
<L%=N BL<%N=( This la. the associati"e la for concatenation. says that e can concatenate three
languages by concatenating either the first to or the last to initially(
I9,/)1)1,8
An identity for an operator is a "alue such that hen the operator is applied to the identity for addition. since
D3x B x3DBx. and A is the identity for multiplication. since A y B y A B y(
3 L B L 3 B L( This la asserts that is the identity for union(
L B L BL( This la asserts that is the identity for concatenation(
A//1+13*)"!8
An annihilator for an operator is a "alue such that hen the operator is applied to the annihilator and some
other "alue. the result is the annihilator( For instance. D is an annihilator for multiplication. since D y B y
D BD( There is no annihilator for addition(
L B LB ( This la asserts that is the annihilator for concatenation(
D18)!1<()1A, L*:8:
A distributi"e la in"ol"es to operators. and asserts that one operator can be pushed don to be applied to
each argument of the other operator indi"idually( The most common example from arithmetic is the
distributi"e la of multiplication o"er addition. that is x <y 3 $= B x y 3 x $( To types are
L<%3N= B L% 3 LN( This la. is the left distributi"e la of concatenation o"er union(
<%3N=L B %L 3 NL( This la. is the right distributi"e la of concatenation o"er union(
T+, I9,#2"),/) L*::
An operator is said to be idempotent if the result of applying it to to of the same "alues as arguments is that
"alue( The common arithmetic operators are not idempotent1 x 3x x in general and x x x in general(
2oe"er. union and intersection are common examples of idempotent operators( Thus. for regular
expressions.
L3LBL This la. the idempotent la for union. states that if e ta*e the union of to identical
expressions. e can replace them by one copy of the expression(
L*:8 1/A"3A1/6 c3"8(!,8:
There are a number of las in"ol"ing the closure operators as gi"en belo
<L
:
=
:
This la says that closing an expression that is already closed does not change the
language( That is <L
:
=
:
L
:


:
B ( The closure of contains only the string (

:
B ( It is easy to chec* that the only string that can be formed by concatenating any
number of copies of the empty string is the empty string itself(
L
3
B LL
:
B L
:
L
L
:
B L
3
3 (
D,8c!1<, )+, 2(#21/6 L,##* >"! !,6(3*! 3*/6(*6,8 */9 *2231c*)1"/8 "> )+, 2(#21/6 L,##*.
."!0
!"A, )+*) )+, 3*/6(*6,8 *!, /") )" <, !,6(3*!. .10
#*!@80
)egular languages ha"e atleast four different descriptions( They are the languages accepted by DFA/s. by
NFA/s and by -NFA/s1 they are also the languages defined by regular expressions( Not e"ery language is a
regular language( >e shall introduce a poerful techni#ue. *non as the ,pumping lemma- for shoing
certain languages not to be regular(
T+, (#21/6 L,##* >"! R,6(3*! L*/6(*6,8
Let us consider the language L
DA
B C D
n
A
n
J n AE( This language contains all strings DA. DDAA.
DDDAAA. and so on. that consist of one or more D/s folloed by an e#ual number of A/s(
L
DA
is not a regular language(
Since it as in some state # hen the A/s started. it cannot ,remember- that ho number of D/s it has
recei"ed already( So this language can/t be represented by using automata and regular expression(
T+, 2(#21/6 3,##* >"! !,6(3*! 3*/6(*6,8
Let L be a regular language( Then there exists a constant n such that for e"ery string in L such that J J
n. the string can be di"ided into three strings. B xy$. such that+
A( y
F( J xy J n
H( For all * D. the string xy
*
$ is also in L(
Then LB L<A= for some DFA A( Suppose A has n states( No. consider any string of length n or more. say
B a
A
a
F
GGa
m
. here m n and each a
i
is an input symbol( no. e can brea* B xy$ as follos+
A(
xBa
A
a
F
G((a
i
F(
y B a
i3A
a
i3F
G((a
K
H(
$ B a
K3A
a
K3F
G((a
m
The folloing figure shos the DFA for the regular languages(
=!1), 8+"!) /"),8 "/ D,c181"/ 2!"2,!)1,8 "> R,6(3*! L*/6(*6,8 ."!0
E723*1/ R,6(3*! L*/6(*6,8 <*8,9 "/ 9,c181"/ 2!"2,!)1,8. .10 #*!@80
&onsider some of the fundamental #uestions about languages+
A( Is the language described empty?
F( Is a particular string in the described language?
H( Do to descriptions of a language actually describe the same language? This #uestion is often called
,e#ui"alence- of languages(
CONVERTING AMONG RERESENTATIONS
>e can con"ert any of the four representations for regular languages to any of the other three
representations( >hile there are algorithms for any of the con"ersions. sometimes e are interested not only
in the possibility of ma*ing a con"ersion. but in the amount of time it ta*es(
C"/A,!)1/6 NFAB8 )" DFAB8
>hen e start ith either an NFA or -NFA and con"ert it to a DFA. the time can be exponential in
the number of states of the NFA(
First. computing the -closure of n states ta*es L<n
H
= time(
Lnce the -closure is computed. e can compute the e#ui"alent DFA by the subset construction(
Then from the subset. the reachable states from the starting state and their corresponding outputs are
considered for the DFA
DFA-)"-NFA c"/A,!81"/
%odify the transition table for the DFA by putting set-brac*ets around states and. if the output is an -NFA.
adding a column for ( Since e treat the number of input symbols as a constant. copying and processing the
table ta*es L<n= time(
A()"#*)"/-)"-R,6(3*! E72!,881"/ c"/A,!81"/
The n
H
expressions can ta*e time L<n
H
N
n
=(The same construction or*s in the same running time if the input
is an NFA. or an -NFA( If e first con"ert an NFA to a DFA and then con"ert the DFA to a regular
expression. it could ta*e time L<n
H
N
nF
=. hich is doubly exponential(
R,6(3*!-E72!,881"/ ')"-A()"#*)"/ C"/A,!81"/
&on"ersion of a regular expression to an -NFA ta*es linear time( >e need to parse the expression
efficiently. using a techni#ue that ta*es only L<n= time on a regular expression of length n
N
( The result is an
expression tree ith one node for each symbol of the regular expression( Lnce e ha"e an expression tree
for a regular expression. e can or* up the tree. building the -NFA for each node(
TESTING EMTINESS OF REGULAR LANGUAGES
JIs regular language L empty?K The anser is+ is empty and all other regular languages are not( If our
representation is any *ind of finite automaton. the emptiness #uestion is hether there is any path
hatsoe"er from the start state to some accepting state( If so. the language is nonempty. hile if the
accepting states are all separated from the start state. then the language is empty( The algorithm can be
summari$ed by this recursi"e process(
BASIS The start state is surely reachable from the start state(
INDUCTION &ompute the set of reachable states from each state( If the set contains any accepting
state. then the language is nonempty otherise it is empty(
The folloing recursi"e rules tell hether a regular expression denotes the empty language(
BASIS denotes the empty language1 and a for any input symbol a do not(
INDUCTION Suppose ) is a regular expression( There are four cases to consider. corresponding to
the ays that ) could be constructed
A( ) B )
A
3 )
F
( Then L<)= is empty if and only if both L<)
A
= and L<)
F
= are empty(
F( ) B )
A
)
F
( Then L<)= is empty if and only if either L<)
A
= or L<)
F
= is empty(
H( ) B )
A
:
( Then L<)= is not empty1 it alays includes at least (
N( ) B <)
A
=( Then L<)= is empty if and only if L<)
A
= is empty. since they are the same
language(
TESTING MEMBERSHI IN A REGULAR LANGUAGE
'i"en a string and a regular language L. the problem is to find out hether the gi"en string can be
accepted by the language or not( >hile is represented explicitly. L is represented by an automaton or
regular expression(
If the language L is represented by an automaton. then for the gi"en string. if the DFA ends in an
accepting state. the anser is ,yes-1 otherise the anser is ,no-(
If L has any other representation besides a DFA. li*e NFA. then e could con"ert to a DFA and
run the test abo"e(
If the NFA has -transitions. then e must compute the -closure before starting the simulation(
Then the processing of each input symbol a has to stages. each of hich re#uires L<s
F
= time(
Lastly. if the representation of L is a regular expression of si$e s. e can con"ert to an -NFA
ith at most Fs states. in L<s= time( >e then perform the simulation abo"e. ta*ing L<ns
F
= time on
an input of length n(
ELUIVALENCE AND MINIMIMATION OF AUTOMATA
In this. e discuss ho to test hether to descriptors for regular languages are e#ui"alent. in the sense that
they define the same language( An important conse#uence of this test is that there is a ay to minimi$e a
DFA( That is. e can ta*e any DFA and find an e#ui"alent DFA that has the minimum number of states(
D,8c!1<, *<"() T,8)1/6 EF(1A*3,/c, "> 8)*),8. ."!0
E723*1/ *<"() )+, ),8)1/6 ,F(1A*3,/c, "> 8)*),8. .10 #*!@80
Lur goal is to understand hen to distinct states p and # can be replaced by a single state that beha"es li*e
both p and #( >e say that states p and # are e#ui"alent if+
For all input strings . p must be accepting state if and only if # is an accepting state(
Poth are accepting or both are non-accepting(
If to states are not e#ui"alent. then e say they are distinguishable(
For example consider the DFA.
& and ' are not e#ui"alent because
one is accepting and the other is not(
&onsider states A and '(
o String doesn/t distinguish
them because both are non-
accepting states(
o For String D. they go to states
P and '. and both states are
non-accepting(
o Li*eise. string A doesn/t
distinguish A from '. because
they go to F and !.
respecti"ely. and both are non-accepting(
o 2oe"er. DA distinguishes A from '. because <A.DA= B &. <'. DA= B !. & is accepting. and
! is not(
o So the states A and ' are not e#ui"alent(
In contrast. consider states A and !(
o Neither is accepting. so does not distinguish them(
o Ln input A. they both go to state F(
o >ith D. they go to states P and 2. respecti"ely( Since neither is accepting. string D by itself
does not distinguish A from !(
o Ln input A they both go to &. and on input D they both go to '( Thus. all inputs that begin
ith D ill fail to distinguish A from !(
o So both are e#ui"alent states(
To find states that are e#ui"alent. e ma*e our best efforts to find pairs of states that are distinguishable( The
folloing table-filling algorithm is used to find out the non-e#ui"alent state pairs(
BASIS If p is an accepting state and # is non accepting. then the pair Cp.#E is distinguishable(
INDUCTION: Let p and # be states such that for some input symbol a. r B <p. a= and s B <#. a= are
a pair of states *non to be distinguishable(
Then Cp.#E is a pair of distinguishable states(
!xecute the table-filling algorithm on the abo"e DFA( The final table is
shon in the folloing figure. here an x indicates pairs of
distinguishable states. and the blan* s#uares indicate those pairs that
ha"e been found e#ui"alent( Initially. there are no x/s in the table(
For the basis. since & is the only accepting state. e put x/s in each pair that in"ol"es &(
For instance. since C&.2E is distinguishable. and states ! and F go to 2 and &. respecti"ely. on input
D. e *no that C!.FE is also a distinguishable pair(
The pair CA.'E can be disco"ered simply by loo*ing at the transitions from the pairs of states on
either D or A. and obser"ing that one state goes to & and the other does not(
>e can sho CA('E is distinguishable on the next round. since on input A they go to F and !.
respecti"ely. and e already established that the pairC!.FE is distinguishable(
The three remaining pairs. hich are therefore e#ui"alent pairs. are CA.!E. CP.2Eand CD.FE(
For eg(. consider hy e cannot infer that CA.!E is a distinguishable pair( Ln input D. A and ! go to
P and 2. respecti"ely. and CP.2E has not yet been shon distinguishable( Ln input A. A and ! both
go to F. so there is no hope of distinguishing them that ay(
The other to pairs. CP.2E and CD.FE ill ne"er be distinguished because they each ha"e identical
transitions on D and identical transitions on A(
Thus. the table-filling algorithm steps ith the table as shon in figure. hich is the correct
determination of e#ui"alent and distinguishable states(
T,8)1/6 EF(1A*3,/c, "> R,6(3*! L*/6(*6,8
The table-filling algorithm gi"es us an easy ay to test if to regular
languages are the same( Suppose languages L and % are each represented
in some ay. eg(. one by a regular expression and one by an NFA(
&on"ert each representation to a DFA( No. imagine one DFA hose
states are the union of the states of the DFA/s for L and %( Technically.
this DFA has to start states. but actually the start state is irrele"ant as far
as testing state e#ui"alence is concerned. so ma*e any state the lone start
state(
No. test if the start states of the to original DFA/s are e#ui"alent. using
the table-filling algorithm( If they are e#ui"alent. then LB% and if not.
then L%(
E7*#23,: &onsider the to DFA/s in the figure( !ach DFA accepts the empty string
and all strings that end in D1 that is the language of regular expression 3<D3A=
:
D( >e
can imagine that the figure represents a single DFA. ith fi"e states A through !( If
e apply the table-filling algorithm to that automaton. the result is as shon in the
figure(
To see ho the table is filled out. e start by placing x/s in all pairs of states here
exactly one of the states is accepting( It turns out that there is no more to do( The
four remaining pairs.CA.&E.CA.DE.C&.DE and CP.!E are all e#ui"alent pairs( ;ou should chec* that no more
distinguishable pairs are disco"ered in the inducti"e part of the table-filling algorithm( For instance. ith the
table as in fig(N(AA. e can/t distinguish the pair CA.DE because on D they go to themsel"es and on A they go
to the pair CP.!E. hich has not yet been distinguished( Since A and & are found e#ui"alent by this test. and
those states ere the start states of the to original automata. e conclude that these DFA/s do accept the
same language(
E723*1/ *<"() M1/1#1N*)1"/ "> DFAB8. ."!0
G1A, * <!1,> /"), "/ M1/1#1N*)1"/ "> DFAB8. .10 #*!@80
For each DFA e can find an e#ui"alent DFA that has
a fe states as any DFA accepting the same language(
The central idea behind the minimi$ation of DFA/s is
that the notion of state e#ui"alence lets us partition the
states into bloc*s such that+
A( All the states in a bloc* are e#ui"alent
F( No to states chosen from to different
bloc*s are e#ui"alent(
E7*#23,
&onsider the first DFA and the table of the
pre"ious #uestion. here e determined the
state e#ui"alences and distinguishabilities for
the states in the DFA( The partition of the
states into e#ui"alent bloc*s is <CA.!E.CP.2E.
C&E.CD.FE.C'E=(
Notice that the three pairs of states that are
e#ui"alent are each placed in a bloc* together.
hile the states that are distinguishable from
all the other states are each in a bloc* alone(
For the automaton of second DFA. the partition is <CA.&.DE.CP.!E=(This example shos that e can
ha"e more than to states in a bloc*( It may appear that A. & and D can all li"e together in a bloc*.
because e"ery pair of them is e#ui"alent. and none of them is e#ui"alent to any other state(
&onsider the first DFA. The start state is CA.!E.since A as the start state of the DFA( The only accepting
state is C&E. since & is the only accepting state of DFA( The transitions of the folloing figure properly
reflect the transitions of original DFA( For instance. it has a transition on input D from CA.!E to CP.2E( If e
examine the original DFA. e find that both A and ! go to F on input A. so the selection of the successor of
CA.!E on input A is also correct( &hec* that all of the other transitions are also proper(
=!1), 8+"!) /"),8 "/ M""!, */9 M,*3- #*c+1/,8. ."!0 .10
#*!@80
E723*1/ M""!, */9 M,*3- #*c+1/,8.
Lne limitation of the finite automaton as e ha"e defined it is that its output is limited to a binary signal+
,accept- 0 ,don/t accept(- %odel in hich the output is chosen form some other alphabet ha"e been
considered (There are to distinct approaches1 the output may be associated ith the state <called a %oore
machine= or ith the transition <called a %ealy machine=( >e shall define each formally and then sho that
the to machine types produce the same input-output mappings(
M""!, #*c+1/,8
A %oore machine is a six-tuple <M. V. W. X. Y.#D= . here M. V. Y and #D are as in the DFA( W is the output
alphabet and X is a mapping from M to W gi"ing the output associated ith each state( The output of % in
response to input aA.aF (((( an . nZD. is X <#D= X <#A= G( X <#n=. here #D.#A. G(( #n is the se#uence of
states such that Y <#i-A.aA= B #i for A[ i [ n( Note that any %oore machine gi"es output X <#D= in
response to input \( The DFA may be "ieed as a special case of a %oore machine here the output
alphabet is CD.AE and state # is ,accepting- if and only if X <#=BA(
E7*#23, :
Suppose e ish to determine the residue mod H for each binary string treated as a binary integer( To begin.
obser"e that if I ritten in binary is folloed by a D. the resulting string has "alue Fi. and if I in binary is
folloed by a remainder of Fi0H is Fp mod H( If p B D. A. or F. then Fp modH is D. F. orA. respecti"ely(
Similarly. the remainder of <Fi3A=0H is A. D. or F. respecti"ely( It suffices therefore to design a %oore
machine ith three states. #D. #A. and #F. here #K is entered if and only if the input seen so far the residue K(
>e define X<#K= B K for K B D. A. and F (
In the abo"e transition diagram. here outputs label the states( The transition function T is designed to
reflect the rules regarding calculation of residues described abo"e( Ln input ADAD the se#uence of states
entered is #D.#A.#F.#F.#A. gi"ing output se#uence DAFFA( That is. ! <hich has ,"alue-D= has residue D. A has
residue A.F <in decimal= has residue F . and AD <in decimal= has residue A(
M,*3- #*c+1/,8
A %ealy machine is also a six-tuple % B <M. V. W. X. Y. #D=. here all is as in the %oore machine. except X
that maps M : V to W( That is. X<#.a= gi"es the output associated ith the transition from state # on input a.
The output of % in response to input aA.aF G( An is X <#D.aA= X <#A.aF= G( X <#n-A.an=. here #D.#A. G(( #n
is the se#uence of states such that Y <#i-A.aA= B #i for A [ i [ n( Note that this se#uence has length n rather
than length n 3 A as for the %oore machine. and on input \ a %ealy machine gi"es output \(
E7*#23,
!"en if the output alphabet has only to symbols. the %ealy machine model can sa"e states hen compared
ith a finite automation( &onsider the language <D 3 A=:<DD 3 AA= of all strings of D/s and A/s hose last to
symbols are the same( In the next chapter e shall de"elop the tools necessary the sho that this language is
accepted by no DFA ith feer than fi"e states( 2oe"er. e may define a three-state %ealy machine that
uses its state to remember the last symbol read. emits output y hene"er the current input matches the
precious one. and emits n otherise( The se#uence of y/s and n/s emitted by the %ealy machine corresponds
#
D
#
A
#
F
D
D
D
A
A
A
A %oore machine calculating residues
#
D
P
D
P
A
start
D0n
D0N
A0n
D0n
A0n
A0N
%ealy machine
to the se#uence of accepting and non accepting states entered by a DFA on the same input1 hoe"er. the
%ealy machine does not ma*e an output prior to any input. hile the DFA reKects the string !. as its initial
state is nonfinal(
The %ealy machine % B <C#D.pD.pAE.CD.AE.Cy.nE. Y. X.#D= is shon in belo figure( >e use the label a0b on
an arc from state p to state # to indicate that Y <p.a= B # and X <p . a= B b( The response of % to input DAADD
is nnyny. ith the se#uence of states entered being #DpDpApApDpD( Note ho pD remembers a $ero and pA
remember a one( State #D is initial and ,remembers- that no in
E/9 "> U/1) -4
U N I T - III
E723*1/ C"/),7) F!,, G!*##*! .CFG0 :1)+ ,7*#23,? ."!0
E723*1/ )+, c"#2"/,/)8 "> c"/),7) >!,, 6!*##*!? .5 #*!@80
There are four important components in a grammatical description of language+
A( There is a finite set of symbols that form the strings of the language being defined( This set as
CD.AE(>e call this alphabet the terminals or terminal symbols(
F( There is a finite set of "ariables also called sometimes non-terminals or syntactic categories( !ach
"ariable represents a language that is a set of strings(
H( Lne of the "ariables represents the language being defined1 it is called the start symbol( Lther
"ariables represent auxiliary classes of strings that are used to help define the language of start
symbol(
N( There is a finite set of productions or rules the represents the recursi"e definition of a language( !ach
production consist of
<a= A "ariable that is being defined by the production and it is being defined by the production
and it is called as the head of the production(
<b= The production symbol(
<c= A string of $ero or more terminals and "ariables( This string called the body of the production
represents one ay to form strings in the language of the "ariable of the head(
The &F' ' can be represented by its N components that is 'B<].T.P.S=
] set of "ariables
T terminals
P set of productions
S start symbol(
E7*#23,-1+ &onsider the context free grammar for palindromes(
A( P
F( P D
H( P A
N( P DPD
I( P APA
The grammar '
pal
for palindrome is represented by+ '
palB
<CpE.CD.AE.A.P=
here A represents the set of I production as seen abo"e(
E7*#23,-4+ &onsider the set of identifiers only ith the letters a. b and the digits D and A( !"ery identifier
must begin ith a or b. hich may be folloed by any string in Ca.b.D.AE ( >e need to "ariables in this
grammar one e call !. represents the expressions( It is the start symbol and represents the language of
expressions e are defining( The other "ariable I represents the identifier( The regular expression for the
language is <a3b= <a3b3D3A=:( This expression can be con"erted into a &F' as follos+
A( ! I
F( ! ! 3 !
H( ! ! : !
N( ! < ! =
I( I a
Q( I b
T( I Ia
R( I Ib
S( I ID
AD( I IA
The grammar for expressions is started finally as ' B <C!.IE.T.P.!= here T is the set of symbols C3.:.
<.=a.b.D.AE and P is the set of productions shon abo"e(
)ule <A= is the basis rule for expressions( It says that expressions can be a single identifier(
)ule <F= through<N= describe the inducti"e case fro expressions(
)ule<F= say that an expression can be to expressions connected by a plus sign.
)ule <H=says the same ith a multiplication sign(
)ule <N= says that if e ta*e any expression and put matching parentheses around it. the result is
also an expression(
)ules<I= through<AD= describe identifiers( I(
The basis is rules<I=and <Q=1 they say that a and b are identifiers( The remaining N rules are the
inducti"e case. They say that if e ha"e any identifier. e can follo it by a.b.D<or=A. and the
result ill be another identifiers(
D18c(88 )+, 9,!1A*)1"/8 "> 6!*##*!8 :1)+ ,7*#23,? ."!0
=!1), 8+"!) /"),8 "/ .10 R,c(!81A, 1/>,!,/c, */9 .110 D,!1A*)1"/8? .5 #*!@80
There are to approaches found for the inference of &F'( The more con"entional approach is to use the
rules from body to head( >e ta*e strings *non to be in the language of each of the "ariable of the body.
concatenate them. in the proper order ith any terminals appearing in the body and infer that the
resulting strings is in the language of the "ariable in the head( >e shall refer to this procedure as
!,c(!81A, 1/>,!,/c,(
E7*#23,:
Let us consider some of the inferences e ma*e using the grammar for expression <a3b= <a3b3D3A=:(
For example. line <i= says that e can infer string a is in the languages for I by using production I( Lines
<ii= through <i"= says e can infer that bDD is an identifier by using production Q once<to get the b= and
then applying production S tice<to attach the to D/s=(
Lines <"= and <"i= exploit production A to infer that. since that any identifier is an expression. the strings
a and b
DD
. hich e inferred in lines <i= and <i"= to be identifiers are also in language of "ariable !(
Line<"ii= uses production F to infer that the sum of these identifiers is an expression. line<"iii= uses
production N to infer that the same string ith parentheses around it is also an identifier and line<ix= use
production H to multiply the identifier a by the expression e had disco"ered in line<"iii=(
There is another approach to define the language of a grammar. in hich e use the productions from
head to body( >e expand the start symbol using one of its productions( >e further expand the resulting
string by replacing on of the "ariables by the body of one of its production and so on. until e deri"e a
string consisting entirely of terminals( The language of grammar is all strings of terminals that e can
obtain in this ay( This use of grammar is called D,!1A*)1"/8.
The process of deri"ing strings by applying productions from head to body re#uires the definition of a
ne relation symbol ( Suppose 'B<].T.P.S= is a &F'(
Let A ^ _ be a production of '( Then e say A` ^ _ `(
>e may entered the relationships to represent $ero. one or many deri"ation steps. For
deri"ation e use a : to denote ,$ero or more steps-(
!xample+ The inference that a
:
<a3bDD= is in the language of "ariable ! can be reflected in a deri"ation of
that string. starting ith the string !( 2ere is one such deri"ation(
G1A, 8+"!) /"),8 "/ 3,>) #"8) */9 !16+)#"8) 9,!1A*)1"/8? ."!0
D18c(88 LMD */9 RMD? .5 M*!@80
To restrict the number of choices e ha"e in deri"ing a string. it is often useful to re#uire that at each step.
e replace the leftmost "ariable by one of its production bodies such a deri"ation is called as leftmost
deri"ation and e indicate that a deri"ation is leftmost by using the relations
lm

and.
:
lm

for one or many


steps respecti"ely( If the grammar ' that is being used is not ob"ious. e can place the name ' belo these
symbols if it is not clear hich grammar is being used(
Similarly it is possible to re#uire that at each step the rightmost "ariable is replaced by one of its bodies. if
so. e call the deri"ation rightmost and use the symbols
rm

and.
:
rm

to indicate one or more rightmost


deri"ation steps. respecti"ely( Again the name of the grammar may appear belo these symbols if it is not
clear hich grammar is being used(
E7*#23,:
&onstruct L%D for the gi"en string Ba
:
< a3bDD= by considering the production for <a3b=<a3b3D3A=
:
(
There is a rightmost deri"ation that user the same replacements for each "ariable. although it ma*es the
replacement in different order( This rightmost deri"ation is
Any deri"ation has an e#ui"alent leftmost and an e#ui"alent rightmost deri"ation(
E723*1/ C"/),7) F!,, L*/6(*6,? ."!0
E723*1/ )+, 3*/6(*6, "> )+, 61A,/ 6!*##*!? .5 #*!@80
If '<].T.P.S= is a &F'. the language of '. denoted L<'=. is the set of terminal strings that ha"e deri"ations
from the start symbol( That is
L<'= B C in T J S
:
G

E
If a language L is the language of some context free grammar. then L is said to be a context free language or
&FL( For instance. e asserted that the grammar of the language of palindromes o"er alphabet CD.AE( Thus
the set of palindrome is a context free language(
!x+ &ontext free grammar for palindromes(
E723*1/ 8,/),/)1*3 >"!# :1)+ ,7*#23,? ."!0
E723*1/ !16+) 8,/),/)1*3 >"!# */9 3,>) 8,/),/)1*3 >"!#? .5 #*!@80
Deri"ations from the start symbol produce strings that ha"e a special role( >e call these ,sentential
forms-(
That is if 'B<].T.P.S= is a &F'. then any string ^ in <] 4 T=: such that
s ^ is a sentential form(
If s
:
lm
^ . then ^ left sentential form and if
s
:
rm
^ .then ^ is a right sentential form(
Note that the language L<'= is those sentential forms that are in T:. that is they consist solely of
terminals(
&onsider the grammar for expressions( For example !:<I3!= is a sentential form. since there is a
deri"ation
! !:!
!:<!=
!:<!3!=
!:<!3I=
2oe"er this deri"ation is neither leftmost nor rightmost. since at the last step. the middle ! is replaced(
As an example of left-sentential form. consider a:!. ith the leftmost deri"ation(
! !:!
I:!
a:!
Additionally the deri"ation
! !:!
!:<!=
!:<!3!=
shos that !:<!3!= is a right sentential form(
=!1), 8+"!) /"),8 "/ 2*!8, )!,,8? ."!0
E723*1/ )+, c"/8)!(c)1"/ "> 2*!8, )!,,8? ."!0
G1A, )+, c"/8)!(c)1"/ "> 2*!8, )!,, :1)+ ,7*#23,? .10 #*!@80
There is a tree representation for deri"ations that has pro"ed extremely useful( This tree shos clearly ho
the symbols of a terminal string are grouped into substrings. each of hich belongs to the language of one of
the "ariables of the grammar( The trees are *non as ,parse tree-( In compiler the tree structure of the
source program facilitates the translation of the source program into executable code by alloing natural.
recursi"e functions to perform this translation process(
Let us fix on a grammar ' B <].T.P.S=( The parse tree for ' is a tree ith the folloing condition(
A( !ach interior node is labeled by a "ariable in ](
F( !ach leaf is labeled by a "ariable. a terminal. or ( 2oe"er. if the leaf is labeled . then it must be
the only child of its parent(
H( If an interior node is labeled A. and its children are labeled
5A.5F.GGG(.5
*(
respecti"ely from the left. then A xA.xF.GGG(.x
*(
is a production in
P( Note that the only one time one of the x/s can be is if that is the label
of the only child. and A is a production of '(

The folloing figure shos a parse tree denoting the deri"ation of I 3 !
from !( The fig<b= shos a parse tree shoing the deri"ation P DAAD
T+, -1,39 "> )+, 2*!8, )!,,:
If e concatenate the lea"es of the
parse tree from the left. e get a string
called the yield of the tree. hich is
alays a string deri"ed from root
"ariable(
The special importances of parse tree
are
A( The yield is a terminal string( That is all lea"es are labeled
either ith a terminal or ith (
F( The root is labeled by the start symbol(
The folloing figure is an example of a tree ith a terminal string
as yield and the start symbol at the root( This particular parse tree is
the representation of that deri"ation a:<a3bDD=(
G1A, * <!1,> /"), "/ )+, I/>,!,/c,8, D,!1A*)1"/8 O *!8, T!,,8? .10 #*!@80
'i"en grammar 'B<].T.P.S=. e shall sho that the folloing are e#ui"alent(
A( The recursi"e inference determines that terminal string 6/ is in the language of "ariable A(
F(
:
w A
H(
w A
lm
:


N(
w A
rm
:

I( There is a parse tree ith root A and yield (


In fact. except for the use of recursi"e inference. hich e only defined for terminal strings all the other
conditions - the existence of deri"ations. leftmost or rightmost deri"ations. and parse trees are also
e#ui"alent if is a string that has some "ariables( That is each are in that diagrams indicates that e pro"e a
theorem that says if meets the condition at the trail. then it meets the condition at the head of the arc(
F!"# I/>,!,/c, )" T!,,8:-

Let 'B<].T.P.S= be a &F'( If the recursi"e inference procedure tells us that
the terminal string is in the language of "ariable A. then there is a parse
tree ith root A and yield (
F!"# T!,,8 )" D,!1A*)1"/
In order to understand ho deri"ations may be constructed. e need first to see ho one deri"ation of a
string from a "ariable can be embedded ithin another deri"ation( Let us consider the expression grammar(
It is easy to chee* that there is a deri"ation
! I Ib ab
As a result. for any string ^ and `. it is also true that
! ` ^ I ` ^ Ib ` ^ ab `
The Kustification is that e can ma*e the same replacements of production bodies of for heads in the context
of ^ and `( For instance if e ha"e a deri"ation that begins ! !3! !3<!=. e could apply the eri"ation
of ab from the second ! by treating ,!3<- as ^ and ,=- as `( This deri"ation ould then continue
!3! !3<I= !3<Ib= !3<ab=(
E74+ Let us construct the leftmost deri"ation from the folloing tree(
E E G E
I G E
* G E
* G . E 0
* G . E H E 0
* G . I H E 0
* G . * H E 0
* G . * H I 0
* G . * H I0 0
* G . * H I00 0
* G . * H <00 0
A
:
F!"# D,!1A*)1"/8 )" R,c(!81A, I/>,!,/c,8
>hene"er there is a deri"ation A for some &F'. then the fact that is in the language of A is
disco"ered in the recursi"e inference procedure( Suppose that e ha"e a deri"ation
A 5A.5F.GGG(5
*
(
Then e can brea* into pieces BA.F.GGG(( (
*
such that 5
i

i
( Note

that if

5
i
is a terminal. then

i
B5
i
. and the deri"ation is $ero steps(
If 5
i
is a "ariable. e can obtain the deri"ation of 5
i

i
by starting ith the deri"ation A and
stripping aay(
All the position of the sentential forms that are either to the left or right of the positions that are
deri"ed from 5
i
(
All the steps are not rele"ant to the deri"ation of
i
from 5
i
(
E723*1/ )+, >"!#*3 9,>1/1)1"/ "> 2(8+ 9":/ *()"#*)*? ."!0
G1A, 8+"!) /"),8 "/ )+, 9,>1/1)1"/ "> 2(8+ 9":/ *()"#*)*? .5 #*!@80
Lur formal definition or notation for pushdon automation <PDA= in"ol"es se"en components( >e rite the
specification of a C .L, , , , F
0
, M
0
, F0
The components ha"e the folloing meanings(
L A finite set of states. li*e the states of a finite automation(
A finite set of input symbols(
A finite set of symbols that are alloed to push onto the stac*(
The transition function( Formally. ta*es as argument a triples <#.a.5= here
# is state in M(
a is either an input symbol or \
5 is a stac* symbol that is a member of F(
F
0
The stac* state. The PDA is in this state before ma*ing any transition(
M
0
The start symbol initially. the PDA/s stac* consists of one instance of this symbol and nothing else(
F The set of accepting states or final states(
E7*#23,
Let us design a PDA p to accept the language L
n)(
Lf e shall use the stac* symbol. $D to mar* the bottom
of the stac*( >e need to ha"e this symbol present so that. after e pop off the stac* and reali$e that e
ha"e seen
)
on the input e still ha"e something or the stac* to permit us to ma*e a transition to the
accepting state #F( Thus our PDA for L
nr
can be described as
2C.DF0,F1,F4E,D0,1ED0,1,N0E, F0,N0,DF4E0
>here S is defined by the folloing rules(
A( <#
D
.D.8
D
=BC<#D.D8
D
=E and <#D.A. 8
D
=BC<#D.A8
D
=E( Lne of these rules applies initially. hen e are in
state #
D
and e see the stac* symbol 8
D
at the top of the stac*( >e need the first input and push it
once the stac*. lea"ing 8
D
belo to mas* the bottom(
F( <#D.D.D=BC<#D.DD=E. <#D.D.A=BC<#D.DA=E. <#D.A.D=BC<#D.AD=E. <#D.A.A=BC<#D.AA=E( These N.
similar rules allo us to ones the top of the stac* ands lea"ing the pre"ious top stac* symbol alone(
H( <#D. .8
D
=BC<#A.8
D
=E and <#D. ,D=BC<#A.D=Eand <#D. .A=BC<#.A=E( These H rules allo P to go
from state #D to state #A spontaneously <on input= lea"ing the symbol at the top of the stac* as it is(
N( BC<#A.D.D=EB <#A. =E and <#A.A.A=BC<#A.=E( No. in state #A e can match input symbols against
the top symbols on the stac* and pop hen the symbols match(
I( <#A. .8
D
=BC<#F.8
D
=E( Finally. if e expose the bottom of the stac* marches to and e are in state
#A. then e ma*e found and input of the form
)
( >e go to state #F and accept(
G1A, 8+"!) /"),8 "/ 6!*2+1c*3 /")*)1"/8 */9 9,8c!12)1"/ "> DA? ."!0
E723*1/ DA :1)+ !,>,!,/c, )" 6!*2+1c*3 /")*)1"/8 */9 9,8c!12)1"/8? .15 #*!@80
The transition diagram of a finite automation ill ma*e the aspects of the beha"ior of a gi"en PDA clearer(
So e introduce and use the transition diagram for PDA/s in hich+
A( The node corresponds to the states of a PDA(
F( An arro labeled start indicates the start state and doubly circled states are accept in. as for finite
automation(
H( The arcs correspond to transitions of the PDA in the folloing sense( An arc labeled a. x0 P from state
# to state p means that s<#.a.x= contains the pair<p. P0 perhaps among other pairs( That is the arc
labeled tells hat input is used and also gi"es
the old and ne tops of the stac*(
The only thing that use Left us is hich stac* symbol is
the start symbol( &on"entionally. it is 8
D
. unless e
indicate otherise( Thus it shos the graphical
representation of PDA(
I/8)*/)*/,"(8 9,8c!12)1"/ "> * DA:
The PDA goes from configuration to configuration. in
response to the input symbols <or=times ,but unli*e the
finite the state is the only thing that e need to *no about the automation. the pdas configuration of a pda
by a triple <#..P=. here
# is the state
is the remaining input and
P is the stac* contents(
E7*#23,
Let us consider the action of the PDA of example abo"e on the input AAAA( Since #D is the start state and $D
is the start symbol. the initial ID is <#D.AAAA.$D=( Ln this input PDA has an opportunity of guess rongly
se"eral times( The entire se#uence of ID/s that the PDA can reach from the initial Id<#D.AAAA.$D= as shon
belo(
E723*1/ )+, 3*/6(*6,8 "> DA? ."!0
G1A, * <!1,> /"), "/ 3*/6(*6,8 "> DA? .10 M*!@80
The PDA accepts its input by consuming it and entering an accepting state( >e call this approach
,acceptance by final start-( There is a second approach to defining for any PDA the language ,acceptance by
empty stac*- that is the set of strings that cause the PDA to empty its stac* starting from the initial string(
Acc,2)*/c, "> F1/*3 S)*),
Let pB<M. . . . #
D
. $
D
. F= be a PDA( Then L<p=. the language accepted by p by final state is
C J <#
D
. . $
D
= <#. \. ^= E

That is. starting in the initial string aiting on the input. p consumes from the input and enters and
accepting state( The content of the stac* at the time is irrele"ant(
Acc,2)*/c, <- E#2)- S)*c@
For each PDA pB<M. . . . #
D
. $
D
. F=(>e also define
C J <#
D
. . $
D
= <#. \. \= E
For any state #( That is N<P= is the set of input that P can consume and at the same time empty its stac*(
F!"# E#2)- S)*c@ T" F1/*3 S)*),
>ith the PDA diagram e ha"e to add one final state(
!x+ Let us design a PDA that process se#uence of if/s and else/s in a &
program. here i stands for if and e stands for else( >e shall use a stac*
symbol 8 to count the difference beteen the numbers of its seen so far. and
the number of e/s( This simple one state PDA is suggested by the transition diagram as follos(

The PDA is defined as. P
N
B C C#E.Ci.eE.C8E.
n
. #. 8=
The abo"e PDA can be changed as follos to accept the string by reaching the final state(
F1/*3 S)*), )" E#2)- S)*c@
From each accepting state e ha"e to add one more transition to a ne non-accepting state(
E723*1/ )+, ,F(1A*3,/c, "> DABS */9 CFGBS? ."!0
=!1), )+, 81#13*!1)1,8 <,):,,/ DABS */9 CFGBS? .10 M*!@80
The languages defined by PDA/s are exactly the context free languages( The goal is to pro"e that the
folloing three classes of languages(
The context free languages. that is the languages defined by &F'/s
The languages that are accepted by final state by some PDA(
The languages that are accepted by empty stac* by some PDA(
F!"# G!*##*!8 )" (8+9":/ A()"#*)*
'i"en a &F' '. e construct a PDA that simulates the leftmost deri"ations of '( Any left-sentential form
that is not a terminal string can be ritten as xA^. here A is the leftmost "ariable. x is hate"er terminals
appear to its left and ^ is the string of terminals and "ariables that appear to the right of A( >e call A^ the
tail of the left sentential form( If a left sentential form consists of terminals only. then its tail is .
The idea behind the construction of a PDA form a grammar is to ha"e the PDA simulate the se#uence of left
sentential forms that the grammar uses to generate a gi"en terminal string ( The trail of each sentential
form xA^ appears on the stac*. ith A at the top( At that time. x ill be ,represented- by our ha"ing
consumed x from the input. lea"ing hate"er of follos its prefix x( That is if B xy. then y ill remain
on the input(
Suppose the PDA is in an ID<#.y.A^=. representing left sentential form xA^( It guesses the production to use
to expand A. say AQ( The mo"e of the PDA us to replace A on the top of the stac* by `. entering
ID<#.y.`^=( Note that there is only one state #. for this PDA(
No <#.y.`^= may not be a representation of the next left-most form. because ` may has a prefix of terminals
( In fact ` may ha"e no "ariable at all. and ^ may ha"e a prefix of terminals( >hate"er terminals appear at
the beginning of `^ need to be remo"ed. to expose the next "ariables ate the top of the stac*( These
terminals are compared against the next input symbols. to ma*e sure our guesses at the leftmost deri"ation
of input string are correct. if not. this branch of the PDA dies deri"ation of . then e shall e"entually
reach the left-sentential form ( At that point. all the symbols on the stac* ha"e either been expanded or
matched against the input( The stac* is empty and e accept by empty stac*(
Let 'B<].T.M.S= be a &F'( &onstruct the PDA p that accepts1 L<'= by empty stac* as follos(
PB<C#E.T. ] T . .# .s=
>here transition function is defined by
A( For each "ariable A. <#. . A=B C<#. `=0A ` is a production of PE(
F( For each terminal a . <#.a.a=BC<#. =E
!xample+ Let us con"ert the expression grammar to PDA ( The grammar is
I a J b J Ia J Ib J ID J IA
! I J !:! J !3! J <!=
The set of terminals for the PDA is Ca.b.D.A<.=.3:E( The R symbols and the symbols I and ! form the stac*
alphabet( The transition function for the PDA is+
A( <#. . I= BC <#.a=. <#.b=. <#.Ia=. <#Ib=. <#.ID=. <#.IA= E
F( <#. . != B C <#. I=. <#.!3!=. <#.!:!=.<#.<!== E
H( <#.a.a=BC<#.=E1 <#.b.b=BC<#.=E1 <#.D.D=BC<#.=E1
N( <#.A.A=BC<#.=E1 <#.<.<=BC<#.=E1 <#.=.==BC<#.=E1
I( <#.3.3=BC<#.=E1 <#.:.:=BC<#.!=E1
F!"# DA )" 6!*##*!:
A PDA may change state as it pops stac* symbols. so e should also note the state that it enters hen if
finally pops a le"el off its stac*(
G1A, * <!1,> /"), "/ D,),!#1/18)1c (8+9":/ A()"#*)* .DDA0? .10 M*!@80
D18c(88 DDA <*8,9 "/ !,6(3*! ,72!,881"/, c"/),7) >!,, 3*/6(*6,8, */9 *#<16("(8 6!*##*!?
A PDA is deterministic if there is ne"er a choice of mo"e in any situation( These choices are of to *inds. of
<#.a.5= contains more than one pair. then surely the PDA is nondeterministic because e can choose among
these pairs hen decidedly on the next mo"e( 2oe"er e"en if <#.a.x= is alays a singleton. e could still
ha"e a choice beteen using a real input symbol. or ma*ing a ro on ( Thus e define a PDA
pB<M..a..#D.8
D
.F= to be deterministic . if and only if the folloing conditions are met(
A( <#.a.x= has at most one member for any # in M. a in or aB. and x in a(
F( If <#.a.x= is non empty. for some a in . then <#..x= must be empty(
!xample+ The language L
r
that has no DPDA( If e put a center-mar*er 6c/ in the middle. e can ma*e the
language recogni$able by a DPDA(
The strategy of the DPDA is to store D/s and A/s on its stac*. until it sees the context mar*ers c( If then goes
to another state. in hich it matches input symbols against stac* symbols and pops the stac* if they match( If
it e"er finds a non match it dies. its input cannot be of the form c
)
( If it succeeds in popping its stac*
don to the initial symbol. hich mar*s the bottom of the stac*. then it accepts its input(

The PDA is non deterministic. because on state #D. it alays has the choice of pushing the next input symbol
onto the stac* or ma*ing a transition on to state #A1 ie(. it has to guess hen it has reached the middle ( The
DPDA for L
cr
is shon in the folloing figure(
This PDA is clearly deterministic( It ne"er has a choice of mo"e in the same state. using the same input and
stac* symbol( As for choices beteen using a real input symbol or . the -transition it ma*es is from #A to
#F ith 8
D
at the top of the stac*( 2oe"er. in state #A. there are no other mo"es hen 8
D
is at the stac* top(
R,6(3*! L*/6(*6,8 */9 D,),!#1/18)1c DABS:
The DPDA/s accept a class of languages that is beteen the regular languages and the &FL/s( >e shall first
pro"e that he DPDA language include all the regular languages( If e ant the DPDA to accept by empty
stac*. then e find that our language recogni$ing capability is rather limited( Say that language L has the
prefix property if there are no to different strings x and y in L such that x is a prefix of y(
!xample+ That is . it is not possible for there to be to strings c
)
and xcx
)
. one of hich is a prefix of the
other. unless they are the same string( To see hy suppose c
)
is a prefix of xcx
)
. and x( Then must
be shorter than x( Therefore the c in c
)
comes in position here xcx
)
has a D or A1 it is a position in the
first x( That point ultraist the assumption that c
)
is a prefix of xcx
)
(
Ln the other hand. there are some "ery simple languages that do not ha"e the prefix property( &onsider CDE
:
.
that is the set of all string of D/s( &learly there are pairs of strings in this language one of hich is a prefix of
the other( So this language does not ha"e a prefix property( In fact of any to strings. one is a prefix of the
other. although the condition is stronger than e need to establish that the prefix property does not hold(
DDABS */9 C"/),7) F!,, L*/6(*6,8:
To see the language L
cr
is not regular. suppose it ere and use the pumping lemma( If n is a constant of the
pumping lemma. then consider the string BD
n
cD
n
. hich is in L
cr(
>hen e pump this string. it is the first
group of D/s hose length must change. so e get in L
cr
strings that ha"e the ,center- mar*er not in the
center( Since these strings are not in L
cr
e ha"e a contradiction and conclude that L
cr
is not regular(
Ln the other hand. there are &FL/s li*e L
cr
that cannot be L<P= for any DPDA P( A formal proof is
complex. but the intuition is transparent( If P is a DPDA accepting L
cr.
then gi"en a se#uence of D/s. it must
store them on the stac*. or do something to count the arbitrary number of D/s(
Suppose p has seen n D/s and the sees AAD
n
( It must "erify that there ere n D/s after the AA. and to do so it
must pop its stac*( No P has seen D
n
AAD
n
( If it sees the identical string next. it must accept. because the
complete input is of the form
)
. ith >BD
n
AAD
n
( 2oe"er if it sees D
m
AAD
m
for some mn. P must not
accept( Since its stac* is empty. it cannot remember hat arbitrary integer n as. and must fail to recogni$e
L
cr
correctly. our conclusion is that+
The languages accepted by DPDA/s by final state properly include the regular languages. but are
properly included in the &FL/s(
E/9 "> U/1) -3
U N I T - IV
E723*1/ *<"() )+, T(!1/6 #*c+1/,8? ."!0
=+*) 18 T(!1/6 M*c+1/,? E723*1/ 1)8 C"#2"/,/)8. .10 #*!@80
A problem that can/t be sol"ed by computer is called as ,undecidable- problems( Those problems can be
sol"ed using a ne computing de"ice called the Turing %achine(
N")*)1"/ >"! )+, T(!1/6 M*c+1/,
The Turing machine consists of a finite control hich can be in any of a finite set of states( There is a tape
di"ided into s#uares or cells1 each cell can hold any one of a finite number of symbols(
A Turing machine
Initially. the input string of symbols chosen from the input alphabet is placed on the tape(
All other tape cells initially hold a special symbol called the blan*(
The blan* is a tape symbol. but not an input symbol(
There is a tape head that is alays positioned at one of the tape cells(
The Turing machine is said to be scanning that cell(
Initially the tape head is at the leftmost cell that holds the inputs(
A mo"e of the T% is a function of the state of the finite control and the tape symbol scanned(
In one mo"e. the Turing machine ill+
A( &hange state( The next state optionally may be the same as the current state(
F( >rite a tape symbol in the cell scanned(
H( %o"e the tape head left or right(
The formal notation e shall use for a Turing machine<T%= is similar to that used for finite automata or
PDA/s ( >e describe a T% by the T-tuple
% B <M. V. . b. #
D.
P. F=
>hose components ha"e the folloing meanings+
L The finite set of state of the finite control(
R The finite set of input symbols(
The complete set of tape symbols(
S The transition function( The "alue of b<#. 5= if it is defined. is a triple <p. ;. D=. here +
p is the next state in M. ; is the symbol. in . D is a direction. either ,left- or ,right-(
F
0
The start state. a member of M. in hich the finite control is found initially(
B The blan* symbol(
F The set of finite or accepting state. a subset of M(
I/8)*/)*/,"(8 D,8c!12)1"/8
The string 5
A
5
F
G(5
i-A
# 5
i
5
i3A
(( 5
n
is used to describe the transition in Turing machines
A( # is the state of the Turing machine(
F( The tape head is scanning the i
th
symbol from the left(
H( 5
A
5
F
((5
n
is the portion of the tape beteen the leftmost and the rightmost nonblan*(
N( The mo"es of the Turing machine are described by notation( And. or ill be used
to indicate $ero. one. or more mo"es of the T%(
E7*#23,:
Let us design a Turing machine ill accept the language CD
n
A
n
J ncBAE( Initially. it is gi"en a finite se#uence
of D/s and A/s on its tape. preceded and folloed by infinity of blan*s( Alternately. the T% ill change a D
to an 5 and then a A to a ;. until all D/s and A/s ha"e been matched(
In more details. starting at the left end of the input. it repeatedly changes a D to an 5 and mo"es to the right
o"er hate"er D/s and ;/s it sees. until it comes to a A(It changes the A to a ;. and mo"es left. o"er ;/s and
D/s . until it finds an 5( At that point. it loo*s for a D immediately to the right. and if it finds one. changing it
to 5 and repeats the process. changing a matching A to a ;(
The formal specification of the T% is %B<C#
D
.#
A
.#
F
.#
H
.#
N
E.CD.AE.CD.A.5.;.PE. b. #D. P. C#NE=. >here b is
gi"en by the table in figure
T!*/81)1"/ D1*6!*#8
A transition diagrams consists of a set of nodes corresponding to the states of the T%( An arc from state # to
state p is labeled by one or more items of the form 5 0 ; D. here 5 and ; are tape symbols. and D is a
direction. either L or )( Start state is represented by the ord ,start- and an arro entering that state(
Accepting states indicated by double circles( This. the only information about the T% one cannot read
directly from the diagram is the symbol used for the blan*( >e shall assume that symbol is P unless e state
otherise( The transition diagram for the pre"ious example is gi"en belo(
E7*#23,
In this example e shall sho ho a Turing machine might compute the function .hich is called minus
or proper subtraction and is defined by m n Bmax<m-n.D= that is m n is m-n if m n and D if mdn(
A T% that performs this operation is specified by
%B<C#
D.
#
A
.GG(#
Q
E.CD.AE.CD.A.PE.b.#
D
.P=
Note that. since this T% is not used to accept inputs. e ha"e omitted the se"enth component. hich is the
set of accepting states( % ill start ith a tape consisting of D
m
AD
n
surrounded by blan*s( % halts ith D
m-n
on its tape. % repeatedly finds its leftmost remaining D and replaces it by a blan*( It then searches right.
loo*ing for a A( After finding a A. it continues right. until it comes to a D. hich it replaces by a A( % then
returns left. see*ing the leftmost D. hich it identifies hen it first meets a blan* and then mo"es one cell to
the right( The repetition ends if either+
A( Searching right for a D. % encounters a blan*( Then the n D/s in Dm

AD
n
ha"e all been changed to
A/s and m3A of the m D/s ha"e been changed to P( % replaces the n3A A/s by one D and n P/s
lea"ing m-n D/s on the tape( Since mcBn in this case. m-nBm-n(
F( Peginning the cycle. % cannot find a D to change to a blan*. because the first m D/s already ha"e
been changed to P( Then ncBm. so m-nBD( % replaces all remaining A/s and D/s by P and ends
ith a completely blan* tape(
The folloing fig shos the rules of the transition function b and e ha"e also represented b as a transition
diagram(
The folloing is a summary of the role played by each of the se"en states+
F
0
This state begins the cycle. and also brea*s the cycle hen appropriate( If % is scanning a D. the
cycle must repeat( The D is replaced by P the scanning P. then all possible matches beteen the to
groups of D/s on the tape ha"e been made. and % goes to state #
I


to ma*e the tape blan*(
F
1
In this state. % searches right. through the initial bloc* of D/s loo*ing for the leftmost hen found. %
goes to state #F(
F
4

% mo"es right. s*ipping o"er A/s until it finds a D( It changes that D to a A. turns leftard. and enters
state( 2oe"er it is also possible that there are no more D/s left after the bloc* of A/s(
F
3

% mo"es left. s*ipping o"er D/s and A/s until it finds a blan*( >hen it finds P. it mo"es right and
returns to state #
D
. beginning the cycle again(
F
&

2ere. the subtraction is complete. but one unmatched D in the first bloc* as incorrectly changed to
a P( % therefore mo"es left. changing A/s to P/s until it encounters a P on the tape( It changes that
P bac* to D. and enters state #
Q
herein % halts(
F
5
State #
I
is entered from #
D
hen it is found that all D/s in the first bloc* ha"e been changed to P( In
this case. described in <F= abo"e. the result of the proper subtractions is D( % changes all remainiRng
D/s and A/s to P and enters state #
Q
(
F
T
The sole purpose of this state is to allo % to halt hen it has finished its tas*( If the subtraction
had been a subroutine of some more complex function. then #Q ould initiate the next step of that
larger computation(
L*/6(*6, "> T(!1/6 #*c+1/, */9 1)8 H*3)1/6
The input string is placed on the tape. and the tape head begins at the leftmost input symbol( If the T%
e"entually enters an accepting state. then the input is accepted. and otherise not( Let % B <M. V.b
D
.P.F= be
a Turing machine( Then L<%= is the set of strings in such that for some state p in F and any tape strings ^
and `( This definition as assumed hen e discussed the Turing machine of
!xample hich accepts strings of the form D
n
A
n
( The set of languages e can accept using a Turing machine
is often called the recursi"ely enumerable languages or )! languages(
There is another notation of ,acceptance- that is commonly used for Turing machines acceptance by halting(
>e say a T% halts if it enters a state #. scanning a tape symbol x. and there is no mo"e in this situation( i(e(.
b<#.5= is undefined(
G1A, 8+"!) /"),8 "/ !"6!*##1/6 T,c+/1F(,8 >"! T(!1/6 M*c+1/,8. ."!0
E723*1/ )+, !"6!*##1/6 T,c+/1F(,8 1/A"3A,9 1/ T(!1/6 M*c+1/,8. .10 M*!@80
The ability of a T% is denoted by
A( Storage in the State
F( %ultiple trac*s
H( Subroutines
S)"!*6, 1/ )+, S)*),
The finite control is used not only to represent a position in
the ,program- of the Turing machine. but to hold a finite amount of data( 2ere e see the finite control
consisting of not only a ,control- state # but three data elements A. P. and &( The techni#ue re#uires no
extension to the T% model and is considered as a tuple( >e shall design a T%
%B<M.CD.AE.CD.A.PE.b.7#
D
.P9.C7#
A
.P9E=
That remembers in its finite control the first symbol that it sees. and chec*s that it does not appear elsehere
on its input( Thus. % accepts the language DA:3AD:( Accepting regular languages such as this one does not
stress the ability of Turing machines but it ill ser"e as a simple demonstration(
The set of states M is C#
D
.#
A
E CD.A.PE( T at is the states may be thought of as pairs ith to
components(
a= A control portion #
D
or #
A
that remembers hat the T% is doing control state #
D
indicates that %
has not yet read its first symbol hile #A indicates that it has read the symbol. and is chec*ing
that it does not appear elsehere by mo"ing right and hoping to reach a blan* cell(
b= A data portion .hich remembers the first symbol seen. >hich means be D the transition function
b of % is as follos+
A( b<7#
D.P
9.a=B<7#
A
.a9.a.)= for aBD or aBA( Initially. #
D
is the control state .and the data portioDn
the is P(. the symbol scanned is copied into the copied into the second component of the
state. and % mo"es right. entering control state #
A
as it does so(
F( <#
A
.a9.a= B <7#
A
.a9.a.)= here a is the ,complement- of a. that is. D if aBA and A if aBD( In
state #A. % s*ips o"er each symbol D or A that is different from the one it has stored in its
state. and continuous mo"ing right(
H( b<7#
A
.a9.P= B<7#
A
P9.P.)= for aBD or aBA( If % reaches the first blan*. it enters the accepting
state7#
A.
P9(
M(3)123, T!*c@8
Turing machine consists of se"eral trac*s( !ach trac* can hold one symbol. and the tape alphabet of the T%
consists of tuples. ith one component for each ,trac*-( A common use of multiple trac*s is to treat one
trac* as holding the data and a second trac* as holding a mar*( In the present example. e shall use a
second trac* explicitly to recogni$e the non-context-free language
L
c
BCc J is in <D3A=
3
E
The Turing machine e shall design is+
%B<M. . e. b. 7#A.P9. 7P.P9. C7#S.P9E=
>here+
L The set of states is C#A.#F.GG#SECD.AE that is pairs consisting of a control state #
i
and a data
component D or A(
U The set of tape symbols is CP.:ECD.A.c.P=( The first component or trac* can be either blan* or
chec*ed represented by the symbols P and : respecti"ely
The input symbols are 7P.D9 and 7P.A9 identified ith D and A respecti"ely(
The transition function b(
S(<!"()1/,8
A Turing machine subroutine is a set of states that perform some useful process( This set of states includes a
start state and another state that temporarily has no mo"es. and that ser"es as the ,return- state to pass
control to hate"er other set of sates called the subroutine( The ,call- of a subroutine occurs hene"er
there is a transition to its initial state( Since the T% has no mechanism for remembering a ,return address-.
that is. a state to go to after it finishes. should our design of a T% call for one subroutine to be called from
se"eral states. e can ma*e copies of the subroutine. using a ne set of states for each copy( The ,calls- are
made to the start states of different copies of the subroutine. and each copy ,returns- to a different state(
E7*#23,: >e shall design a T% to implement the function ,multiplication-( That is. our T% ill start ith
D
m
AD
n
on its tape. and ill end ith D
mn
on the tape( An outline of the strategy is+
A( The tape ill in general ha"e one nonblan* string of the form D
i
AD
n
AD
*n
for some *(
F( In one basic step. e change a D in the first group to P and add n D/s to the last group. gi"ing us a
string of the form D
i-A
AD
n
AD
<*3A=n
(
H( As a result. e copy the group of n D/s to the end m times. once each time e change a D in the first
group to P( >hen the first group of D/s completely changed to blan*s. there ill be mn D/s in the last
group(
N( The final step is to change the leading AD
n
A to blan*s. and e are done(
The heart of this algorithm is a subroutine. hich e call copy( This subroutine implements step <F= abo"e
copying the bloc* of n D/s to the end( &opy con"erts an ID of the form D
m-*
A#
A
D
n
AD
<*-A=n
to ID D
m-*
A#
I
D
n
AD
*n
(
The folloing figure shos the transitions of subroutine copy(
This subroutine mar*s the first D ith an 5. mo"es right in state #
F

until it finds a blan*. copies the D
there. and mo"es left in state #
H
to find the mar*er 5(
It repeats this cycle until in state #A it finds a A instead of a D( At that point. it uses state #
N
to change
the 5/s bac* to D/s and ends inn state #
I
(
The complete multiplication Turing machine starts in state #
D
( The first thing it does is go. in se"eral
steps. from ID #oD
m
AD
n
to ID D
m-A
A#
A
D
n
A( The transitions needed are shon in the portion of figure to
the left of the subroutine call1 these transitions in"ol"e states #
D
and #
Q
only(
The purpose of state #T.#R and #S is to ta*e control after copy has Kust copRied a bloc* of n D/s and is
in ID D
m-*
AD
n
AD
*n
( !"entually. these states bring us to state #
D
om-*AD
n
(
At that point the cycle starts again and copy is called to copy the bloc* of n D/s again( As an
exception in state #R the T% may find that all m D/s ha"e been changed to blan*s(
In that case. a transition to state #
AD
occurs( This state. ith the help of state #
AA
. changes the leading
AD
n
A to blan*s and enters the halting state #
AF
(
At this point. the T% is in ID #
AF
D
mn
and its Kob is done(
=!1), 8+"!) /"),8 "/ E7),/81"/8 )" )+, B*81c T(!1/6 M*c+1/,. ."!0
E723*1/ )+, A*!1"(8 #"9,38 "> B*81c T(!1/6 M*c+1/,. .10 M*!@80
Nondeterministic Turing machine. an extension of the basic model that is alloed to ma*e any of a finite set
of choices of mo"e in a gi"en situation( This extension also ma*es ,programming- Turing machines easier.
but adds no language-defining poer to the basic model(
M(3)1)*2, T(!1/6 M*c+1/,8
The de"ice has a finite control and some finite number of tapes( !ach tape
is di"ided into cells. and each cell can hold any symbol of the finite tape
alphabet( As in the single tape T%. the set of tape symbols includes a
blan*. and has a subset called the input symbols. of hich the blan* is not
a member( The set of states includes an initial state and some accepting
states( Initially+
A( The input. a finite se#uence of input symbols is placed on the
first tape(
F( All other cells of all the tapes hold the blan*(
H( The finite control is in the initial state(
N( The head of the first tape is at the left end of the input(
I( All other tape heads are at some arbitrary cell(
A mo"e of the multitape T% depends on the state and the symbol scanned by each of the tape heads( In one
mo"e the multitape T% does the folloing+
A( The control enters a ne state. hich could be the same as the pre"ious state(
F( Ln each tape. a ne tape symbol is ritten on the cell scanned( Any of these symbols may be the
same as the symbol pre"iously there(
H( !ach of the tape heads ma*es a mo"e. hich can be left. right or stationary( The heads mo"e
independently. so different heads may mo"e in different directions. and some may not mo"e at
all(
%ultitape Turing machines li*e one-tape T%/s accept by entering an accepting state(
EF(1A*3,/c, "> O/,-T*2, */9 M(3)1)*2, TMB8
)ecursi"ely enumerable languages are defined to be those accepted by a one-tape T%( %ultitape T%/s
accept all the recursi"ely enumerable languages. since a one tape T% is multitape T%(
R(//1/6 T1#, */9 )+, M*/--T*2,8-)"-"/, C"/8)!(c)1"/
The running time of T% % on input is the number of steps that % ma*es before halting( If % doesn/t halt
on . then the running time of % on is infinite( The time complexity of T% % is the function T<n= that is
the maximum o"er all inputs of length n. of the running time of % on ( For Turing machines that do not
halt on all inputs T<n= may be infinite for some or e"en all n(
The constructed one tape T% may ta*e much more running time than the multiStape T%( 2oe"er. the
amounts of time ta*en by the to Turing machines are commensurate in a ea* sense+ the one-tape T%
ta*es time that is no more than the s#uare of the time ta*en by the other( >hile ,s#uaring- is not a "ery
strong guarantee. it does preser"e polynomial running time(
a= The difference beteen polynomial time and higher groth rates in running time is really the
di"ide beteen hat e can sol"e by computer and hat is in practice not sol"able(
b= Despite extensi"e research. the running time needed to sol"e many problem has not been
resol"ed closer than to ithin the same polynomial
N"/9,),!#1/18)1c T(!1/6 M*c+1/,8
A nondeterministic Turing machine differs from the deterministic "ariety e ha"e been studying by ha"ing a
transition function such that for each state # and tape symbol x. b<#.5= is a set of triples
C<#A.;A.DA=.<#F.;F.DF.=.G(.<#
*.
.;
*
.D=Ehere * is any finite integer(
The language accepted by an NT% % is defined in the expected manner. in analogy ith the other
nondeterministic de"ices. such as N FA/s and PDA/s that e ha"e studied( That is % accepts an input if
there is any se#uence of choices of mo"e that leads from the initial ID ith as input. to an ID ith an
accepting state(
The NT%/s accept no languages not accepted by a deterministic T%( The proof in"ol"es shoing that for
e"ery NT% %
N
e can construct a DT% %
D
that explores the ID/s that %N can reach by any se#uence of its
choices(If %
D
finds one that has an accepting state. then %
D
enters an accepting state of its on( %
D
must be
systematic. putting ne ID/s on a #ueue rather than a stac*. so that after some finite time %
D
has simulated
all se#uences up to mo"es of %
N
for *BA.F.G(
E723*1/ R,8)!1c),9 T(!1/6 M*c+1/,8 :1)+ ,7*#23,. ."!0
G1A, * <!1,> /"), "/ R,8)!1c),9 T(!1/6 M*c+1/,8 :1)+ ,7*#23,. .10 M*!@80
First. e restrict the tapes of the T% to beha"e li*e stac*s( Then. e further restrict the tapes to be
,counters- that is. they can only represent one integer. and the T% can only distinguish a count of D from
any non$ero count(
T(!1/6 M*c+1/,8 :1)+ S,#1-1/>1/1), T*2,8
>e can assume the tape is semi-infinite that is there are no cells to the left of the initial head position( In the
next theorem. e shall gi"e a construction that shos a T% ith a semi-infinite tape can simulate one hose
tape is. li*e our original T% model. infinite in both directions(
The tric* behind the construction is to use to trac*s on the
semi-infinite tape( The upper trac* represents the cells of the
original T% that are at or to the right of the initial head position(
The loer trac* represents the positions but in re"erse order(
The exact arrangement is suggested in figure(
The upper trac* represents cells 5.5.G(.here 5
D
is the initial position of the head 5
A
.5
F
. and so on. are the
cells to right( &ells 5-A.5-F and so on( )epresent cells to the left of the initial position( !nd mar*er and
pre"ents the head of the semi-infinite T% from accidentally falling off the left end of the tape(
>e can ma*e one more restriction to our Turing machine it ne"er rites a blan*( This simple restriction.
coupled ith the restriction that the tape is only semi-infinite means that the tape is at all times a prefix of
nonblan* symbols folloed by infinity of blan*s( Further. the se#uence of non blan*s alays begins at the
initial tape position(
M(3)1 8)*c@ #*c+1/,
>e no consider a class of machines called ,counter
machines(- These machines ha"e only the ability to store a
finite number of integers <,counter-=. and to ma*e different
mo"es depending on hich if any of the counters are
currently D( The counter machine can only add or subtract
one from the counter. and cannot tell to different non$ero
counts from each other in effect. a counter is li*e a stac* on
hich e can place only to symbols+ a bottom-of-stac*
mar*er that appears only at the bottom and one other symbol that may be pushed and popped from the stac*(
A *-stac* machine is a deterministic PDA ith * stac*s( It obtains its input. li*e the PDA does. from an
input source. rather than ha"ing the input placed on tape or stac*. as the T% does( The multi stac* machine
has a finite control. hich is in one of a finite set of states( It has a finite stac* alphabet. hich it uses for all
its stac*s( A mo"e of the multi stac* machine is based on+
The state of the finite control(
The input symbol read. hich is chosen from the finite input alphabet( Alternati"ely. the multi stac*
machine can ma*e a mo"e using input. but to ma*e the machine deterministic. there cannot be a
choice of an -mo"e or a non--mo"e in any situation(
The top stac* symbol on each of its stac*s(
In one mo"e. the multistac* machine can+
&hange to a ne state(
)eplace the top symbol of each stac* ith a string of $ero or more stac* symbols( There can be < and
usually is = a different replacement string for each stac*(
Thus. a typical transition rule for a *-stac* machine loo*s li*e+
<#.a.5A.5FG5
*
=B<p.;A.;FGG.;
*
=
The interpretation of the rule is that state #. ith 5
i
on top of the i
th
stac*. for iBA.FG(.*. the machine may
consume a <either an input symbol or f = from is input. go to state p. and replace 5
i

on top of the i
th
stac* by
string ;
i
for each iBA.F.G.*( The multistac* machine accepts by entering a final state(
>e add one capability that simplifies input processing by this deterministic machine+ e assume there is a
special symbol g. called the end-mar*er that appears only at the end of the input and is not part of that input(
The presence of the end mar*er allos us to *no hen e ha"e consumed all the a"ailable input. e shall
see in the next theorem ho the end mar*er ma*es it easy for the multistac* machine to simulate a Turing
machine( Notice that the con"entional T% needs no special end mar*er. because the first blan* ser"es to
mar* the end of the input(
C"(/),! M*c+1/,8
A counter machine may be thought of in on of to ays+
A= The counter machine has the same structure as the multistac* machine. but in place of each stac* is a
counter( &ounters hold any nonnegati"e integer. but e can only distinguish beteen $ero and
non$ero counters( That is. the mo"e of the counter machine depends on its states. input symbol. and
hich. if any. of the counters are $ero( In one mo"e the counter machine can+
a= &hange state(
b= Add or subtract A from any of its counters independently( 2oe"er. a counter is not alloed to
become negati"e. so it cannot subtract A from a counter that is currently D(
F= A counter machine may also be regarded as a restricted multistac* machine( The restrictions are as
follos+
a= There are only to stac* symbols. hich e shall refer to as 8
D
<the bottom-of-stac* mar*er=.
and 5(
b= 8
D
is initially on each stac*(
c= >e may replace 8
D
only by a string of the form 5
i
8
D
for some icBD(
d= >e may replace 5 only by 5
i
for some icBD( That is. 8
D
appears only on the bottom of each
stac*. and all other stac* symbols. if any. are 5(
>e shall use definition <A= for counter machines. but the to definitions clearly define machines of
e#ui"alent poer( The reason is that stac* 5
i
8
D
can be identified ith the count *( In definition <F=. e can
tell count D from other counts. because for count D e see 8
D
on top of the stac*. and otherise e see 5(
T+, ":,! "> C"(/),! M*c+1/,8
The obser"ations about the languages accepted by counter machines are as follos+
!"ery language accepted by a counter machine is recursi"ely enumerable( The reason I so that a
counter machine is a special case of a stac* machine. and a stac* machine is a special case of a multi
tape Turing machine. hich accepts only recursi"ely enumerable languages(
!"ery language accepted by a non-counter machine is a &FL( Note that a counter. in point-of-"ie
<F=. is a stac*. so a one-counter machine is a special case of one-stac* machine. i(e(. a PDA( In fact.
the languages of one-counter machines are accepted by deterministic PDA/s. although the proof is
surprisingly complex( The difficulty in the proof stems from the fact that the multistac* and counter
machines ha"e an end mar*er at the end of their input(

=!1), 8+"!) /"),8 "/ T(!1/6 #*c+1/,8 */9 c"#2(),!8. ."!0
E723*1/ )+, 8),28 1/A"3A,9 1/ )+, 81#(3*)1"/ "> T(!1/6 #*c+1/,8 <- c"#2(),!8. .10 M*!@80
The claims of T% and computers can be di"ided into to parts+
A( A computer can simulate a Turing machine(
F( A Turing machine can simulate a computer. and
can do so in an amount of time that is at most
some polynomial in the number of steps ta*en
by the computer(
S1#(3*)1/6 * T(!1/6 M*c+1/, <- C"#2(),!

'i"en a particular T% %. e must rite a program
that acts li*e %( one aspect of % is its finite control(
Since there are only a finite number of states and a
finite number of transition rules. our program can
encode states as character strings and use a table of
transitions. hich it loo*s up to determine each mo"e(
Li*eise. the tape symbols can be encoded as character strings of a fixed
length. since there are only a finite number of tape symbols( A serious
#uestion arises hen e consider ho out program is to simulate the Turing-
machine tape( This tape can gro infinitely long. but the computer/s
memory-main memory dis*. and other storage de"ices-are finite(
If there is no opportunity to replace storage de"ices. then in fact e cannot1 a
computer ould then be a finite automaton. and the only languages it could
accept ould e regular( 2oe"er. common computers ha"e sappable storage de"ices. perhaps a- $ip-
dis*. for example( Since there is no ob"ious limit on ho many dis*s e could use. let us assume that as
many dis*s as the computer needs is a"ailable( >e can thus arrange that the dis*s are placed in to stac*s as
shon in the figure(
Lne stac* holds the data in cells of the Turing machine tape that are located significantly to the left of the
tape of the tape head( And the other stac* holds data significantly to the right of the tape head( If the tape
head of the T% mo"es sufficiently far to the left that it reaches cells that are not represented by the dis*
currently mounted in the computer then it prints a message ,sap left- the currently mounted dis* is
remo"ed by a human operator and placed on the top of the right stac*( The dis* on top of the left stac* is
mounted in the computer. and computation resumes(
Similarly. if the T%/s tape head reaches cells so far to the right that these cells are not represented by the
mounted dis*. then a ,sap right- message is printed( The human operator mo"es the currently mounted
dis* to the top of the left stac*. and mounts the dis* on top of the right stac* in the computer( If either stac*
is empty hen the computer as*s that a dis* from that stac* be mounted. then the T% has entered an all-
blan* region of the tape( In that case. the human operator must go toe the store and by a first dis* to mount(
S1#(3*)1/6 * C"#2(),! <- * T(!1/6 M*c+1/,
The folloing Figure suggests ho the Turing machine ould be designed ould be designed to simulate a
computer( This T% uses se"eral tapes. but it could be con"erted to a one-tape T% using the construction of
multi-tape Turing machine
The first tape represents the entire memory of the computer( >e ha"e used a code in hich addresses of
memory ords. in numerical order. alternate ith the contents are ritten in binary( The mar*er symbols :
and hare used to ma*e it easy to find the ends of addresses and contents. and to tell hether a binary string is
am address or contents(
The second tape is the ,instruction counter(- This tape holds one integer in binary. hich represents one of
the memory locations on tape A( The "alue stored in this location ill be interpreted as the next computer
instruction to be executed(
The third tape holds a ,memory address- or the contents of that address after the address has been located on
tape A( To execute an instruction. the T% find the contents of one or more memory addresses that holds data
in"ol"ed in the computation( Lur T% ill simulate the instruction cycle of the computer. as follos(
A( Search the first tape for an address that matches the instruction number on tape F( >e start at the g on
the first tape. and mo"e right. comparing each address ith the contents of tape F(
F( >hen the instruction address is found. examine its "alue( Let us assume that hen a ord is an
instruction. its first fe bits represent the action to be ta*en <e(g( copy. add. branch=. and the
remaining bits code an address or addresses(
H( If the instruction re#uires the "alue of some address copy that address on to the third tape and mar*
the position of the instruction. using a second trac* of the first tape(
N( !xecute the instruction. or the part of the instruction in"ol"ing the "alue(
I( After performing the instruction. and determining that the instruction is not a Kump. add A to the
instruction counter on tape F and begin the instruction cycle again(
The fourth tape holds the simulate input to the computer. since the computer must read its input from a file(
A scratch tape is also shon( Simulation of some computer instructions might ma*e effecti"e use of a
scratch tape or tapes to compute arithmetic operations such as multiplication( Finally. e assume that the
computer ma*es an output that tells hether or not its input is accepted(
C"#2*!1/6 )+, R(//1/6 T1#,8 "> C"#2(),!8 */9 T(!1/6 M*c+1/,8
The running time for the Turing machine that simulates a computer are as follos+
The issue of running time is important because e shall use the T% not only to examine the #uestion
of hat can be computed at all. but hen can be compared ith enough efficiency(
The di"iding line beteen the tractable O that hich can be sol"ed efficientlyifrom the intractable O
problems that can be sol"e. but not fast enough for the solution to be usable Ois generally held to be
beteen hat can be computed in polynomial time and hat re#uires more than any polynomial
running time(
Thus. e need to assure oursel"es that if a problem can be sol"ed in polynomial time on a typical
computer. then it can be sol"ed in polynomial time by a Turing machine. and con"ersely(
Thus the T% described abo"e can simulate n steps of a computer in D<n
H
= time. e need to confront the issue
of multiplication as a computer instruction(
E723*1/ )+, 1/>"!#*3 9,>1/1)1"/ "> 2(8+9":/ *()"#*)* :1)+ ,7*#23,? ."!0

E723*1/ DA :1)+ ,7*#23,? .5 #*!@80
The pushdon automation is a nondeterministic finite automation ith - transitions permitted and one
additional capability. a stac* on hich it can store a string of ,stac* symbols-( The PDA can only access the
information on its stac* in a first in first out ay( It recogni$es only the context free languages( >hile there
are many languages. that are context free. including some e ha"e seen that are not regular languages. there
are also some simple-to-describe languages that are not context free( An example of a non-context free
language isCD
n
A
n
F
n
$

n AE the set of strings consisting of e#ual groups of D/s. A/s and F/s(

A ,finite state control-. reads input. one symbol at a
time( The pushdon automation is alloed to
obser"e the symbol at the top of the stac* and to
base its transition on its current state. the input
symbol. and the symbol at the top of the stac*( Alternati"ely. it ma*es a ,spontaneous- transition. using as
its input instead of an input symbol( In one transition. the pushdon automation
A( &onsumes from the input the symbol that it uses in the transition( If is used for the input then no input
symbol is consumed(
F( 'oes to a ne state. hich may us may not be the same as the pre"ious state(
H( )eplace the symbol at the top of the stac* by any string( The string could be . hich &orresponds to a
pop of the stac*( It could be the same symbol that appeared at the top of the stac* pre"iously that is no
change to the stac* is made(
E7*#23,: Let us consider the language L
r
B C
)
J is in <D3A=:E
This language. often referred to as ,--re"ersed- is the e"en-length palindromes o"er alphabet CD.AE(
>e can design an informal pushdon automation accepting L
r.
as follos(
A( Start in a state #
D
that represents a ,guess- that e ha"e not yet seen the middle that is e ha"e not
seen the end of the string that is to be folloed by its on re"erse(
F( >hile in state #
D
. e read symbols and store them on the stac*. by pushing a copy of each input
symbol on to the stac*(
H( At anytime. e may guess that e ha"e seen the middle that is the end of ( At this time ill be
on the stac* ith the right end of at the top and the left end at the bottom( >e signify this choice
by spontaneously going to state #
A
( Since the automation is non deterministic e actually ma*e both
guesses( >e guess e ha"e seen the end of . but e also slay in static go and continue to read input
and store them on the stac*(
N( Lnce in state #
A
. e compare input symbols ith the symbol at the top of the stac*( If they match. e
consume the input symbol. pop the stac*. and proceed( If they do not match. e ha"e guessed rong.
our guessed as not folloed by
)
( This branch dies. although other branches of the non
deterministic automation may sur"i"e and read to acceptance(
I( If e empty the stac*. then e ha"e ended seen some input folloed by
)
( >e accept the output
that as read up to this point(
E/9 "> U/1) -&
U N I T - V
E723*1/ )+, c"/c,2) "> 1/)!*c)*<131)- )+,"!- <*8,9 "/ 2"3-/"#1*3 )1#,? ."!0
D,8c!1<, )+, c3*88,8 */9 N 2!"<3,# (81/6 1/)!*c)*<131)- )+,"!-? .10 M*!@80
>e introduce the basic components of intractability theory+ the classes P and NP of problems sol"able in
polynomial time by deterministic and nondeterministic T%/s. respecti"ely. and the techni#ue of polynomial
time reduction(
!"<3,#8 8"3A*<3, 1/ 2"3-/"#1*3 )1#,: A Turing machine % is said to be of time complexity T<n= mo"es.
regardless of hether or not % accepts( This definition applies to any function T<n=.such as T<n=BIDn
F
or
t<n=BH
n
3In
N
1e shall be interested predominantly in the case here T<n= is a polynomial in n( e say a
language L is in class p if there is some polynomial T<n= such that LBL<%= for some deterministic Tm % of
time complexity T<n=(
A/ ,7*#23,: @!(8@*3B8 *36"!1)+#: %any problems that ha"e efficient solutions1
perhaps you studied some in a course on data structures and algorithms( These
problems are generally in P( e shall consider one such problem finding there(
Informally e thin* of graphs as diagram such as that of fig
There are nodes. hich are numbered A-N in this example graph( And there are edges beteen some pairs of
nodes( !ach edge has a eight. hich is an integer( A spanning tree is a subset of the edges such that all
nodes are connected through these edges( ;et there are no cycles( An example of a spanning tree is shon by
bold edges( A minimum-eight spanning tree has the least possible total edge eight of all spanning trees(
There is a ell-*non ,greedy- algorithm. called jrus*al/s Algorithm. for finding a %>ST( 2ere is an
informal outline of the *ey ideas+
A( %aintain for each node the connected component in hich the node appears. using hate"er edges
of the tree ha"e been selected so far( Initially. no edges are selected. so e"ery node is in it/s a
connected component by itself(
F( &onsider the loest-eight edge that has not yet been considered1 brea* ties any ay you li*e( If this
edge connects to nodes that are currently in different connected components then+
o Select that edge for the spanning tree. and
o %erge the to connected components in"ol"ed(
H( &ontinue considering edges until either all edges ha"e been considered. or the number of edges
selected for the spanning tree is one less that the number of nodes( Note that in the latter case. all
nodes must be in one connected component and e can stop considering edges(
E7*#23,: In the graph e first consider the edge<A+H= because it has the loest eight. AD( Since A and H
are initially in different components. e accept this edge and ma*e A and H ha"e the same component
number say ,component A-( The next edge in order of eights <F.H=. ith eight AF. since F and H are in the
different components. e accept this edge and merge node F into ,component A-( The third edge is <A.F=.
ith eight AI( 2oe"er. A and F are no in the same component. so e reKect this edge and proceed to the
fourth edge. <H.N=( Since N is not in ,component A-( >e accept this edge. no. e ha"e three edges for the
spanning tree of a N-node graph. and so may stop(
N"/9,),!#1/18)1c 2"3-/"#1*3 )1#,
Formally. e say a language L is in the class NP( if there is a nondeterministic T% and hen % is gi"en an
input of length n. there are no se#uence of more than T<n= mo"es of %( Lur first obser"ation is that. since
e"ery deterministic T% is a nondeterministic T% that happens ne"er to ha"e a choice of mo"es. P NP(
2oe"er. it appears NP contains many problems not in p( The intuiti"e reason is that a NT% running in
polynomial time has the ability to guess an exponential number of possible solutions to a problem and chec*
one in polynomial time. ,in parallel-( 2oe"er it is one of the deepest open #uestions of %athematics
hether pBNP(>hether in fact e"erything that can be done in polynomial time by a NT% can in fact are
done by DT% in polynomial time. perhaps ith a higher-degree polynomial(
T+, T!*A,31/6 8*3,8#*/ 2!"<3,#
The input to TSP is the same as to %>ST. a graph ith integer eights on the edges such as that of
fig and a eight limit >( the #uestion as*ed is hether the graph as a ,2amilton circuit- of total eight at
most >( A 2amilton circuit is a set of edges ith each node appearing exactly once( Note that the number of
edges on a 2amilton circuit must e#ual the number of nodes in the graph(
E7*#23,: The graph of fig actually has only one 2amilton circuit the cycle<A.F.N.H.A=( The total eight of
the cycle is AI3FD3AR3ADBQH( Thus. if > is QH or more. the anser is ,yes-. and if >dQH the anser is
,no-(
2oe"er. the TSP on four-node graph is decepti"ely simple. since there can ne"er be more than to
different 2amilton circuits once e account for the different nodes at hich the same cycle can start. and for
the direction in hich e tra"erse the cycle( In m-node graphs. the number of distinct cycles gros as
L<mk=( the factorial of m. hich is more than Fcm for any constant c(
It appears that all ays to sol"e the TSP in"ol"e trying essentially all cycles and computing their
total eight( Py being cle"er. e can eliminate some ob"iously bad choices( Put it seems that no matter
hat e do. e must examine an exponential number of cycles before e can conclude that there is none
ith the desired eight limit >. or to find one if e are unluc*y in the order in hich e consider the
cycles( Ln the other hand . if e had a nondeterministic computer. e could guess a permutation of the
nodes. and compute the total eight for the cycle of nodes in that order( If there ere a real computer that
as nondeterministic. no branch
A similar amount of time( Thus. a single Otape NT% can sol"e the TSP in L <nN= time at most ( >e conclude
that the TSP is in NP(
"3-/"#1*3 'T1#, R,9(c)1"/8
Lur principal methodology for pro"ing that a problem PF cannot be sol"ed in polynomial time <i(e(
PF is not in P= is the reduction of a problem PA. hich is *non. not be in P. to PF(F( the approach as
suggested in fig hich e reproduce here as
Suppose e ant to pro"e the statement+ if PF is in P. then so is PA-( Since e claim that PA is not in P.
either( 2oe"er. the mere existence of the algorithm labeled ,&onstruct- in fig AD(F is not sufficient to pro"e
the desired statement(
For instance. suppose that hen gi"en an instance of PA of length m . the algorithm produced and
output string of length Fm. hich it fed to the hypothetical polynomial Otime algorithm for PF( if that
decision algorithm ran in .. say. time L<n *=. then on an input of length Fm it ould run in time L<F*m=.
hich is exponential in m( Thus. the decision algorithm for P# ta*es. hen gi"en an input of length m. time
that is exponential in m( these facts are entirely consistent ith the situation here PF is in P and PA is not in
P(
!"en if the algorithm that constructs a PF instance from a PF instance from a PA instance alays
produces an instance that is polynomial in the si$e of its input. e can fail to reach our desired conclusion(
For instance. suppose that the instance of PF constructed is of the same si$e. m. as the PA instance. but the
construction algorithm for PF that ta*es polynomial time L<n
*
= on input of length n only implies that there is
a decision algorithm for PA that ta*es time L<Fm3m*= on input of length m( this running time bound ta*es
into account the fact that e ha"e to perform the translation to PF as ell as sol"e the resulting PF instance(
Again it ould be possible for PA to be in P and PF not(
Thus. in the theory of intractability e shall use polynomial Otime reductions only( A reduction from
pA tD pF is polynomial Otime if it ta*es time that is some polynomial in the length of ht epA instance( Note
that as a conse#uence. the pF instance ill be of a length that is polynomial in the length of the pA instance(
N-C"#23,), !"<3,#8
Let L be a language <problem= in NP e say L is NP Ocomplete if the folloing statements are true
about L(L is in NP( For e"ery language LA in NP there is a polynomial-time reduction of D to L(
An example of an NP-complete problem. as e shall see. is the Tra"eling salesman problem. hich
e introduced already( Since it appears that PUNP. and in particular. all the NP-complete problems are in
NP-P. e generally "ie a proof of NP-completeness for a problem as a proof that the problems is not in P(
>e shall pro"e our first problem. called SAT <for Poolean satisfiability= to be NP-complete by
shoing that the language of e"ery polynomial-time NT% has a polynomial Otime reduction to SAT(
2oe"er. once e ha"e some NP-complete by reducing some *non NPBcomplete problem to it. using a
polynomial-time reduction(
E723*1/ */ N-C"#23,), !"<3,#?."!0 .10
M*!@80
D,8c!1<, *<"() )+, N c"#23,), 2!"<3,#?
This problem hether a Poolean expression is satisfiableiis pro"ed NP-complete by explicitly
reducing the languages of any nondeterministic. polynomial-time T% to the satisfiability problem(
The satisfiability problem
The Poolean expression is built from+
A( ]aribles hose "alues are Poolean+ i(e(. they either ha"e the "alue A<true=or D false(
F( Pinary operators l and ] standing for the logical AND or L) of the to expressions(
H( 4nary operator Ostanding for logical negation(
N( Parentheses to group operators and operands. if necessary at alter the default precedence of operators+-
highest. then l( and finally "(
E7*#23,: An example of a Poolean expression is x l-<y "$=( The sub expression y " $ is true hene"er
either "ariable y or "ariable $ has the "alue true. but the sub expressions is false hene"er both y and $ are
false( The larger sub expression O<y ] $= is true exactly hen y ] $ is false. that is . hen both y and $ are
false( If either y or $ or both are true. then-<y ] $= is false(
Finally. consider the entire expression( Since it is the logical AND of to sub expressions. it is true
exactly hen both sub expressions are true( That is. x l O<y ] $= is true exactly hen x is true. y is false. and
$ is false(
A truth assignment for a gi"en Poolean expression ! assigns either true or false
to each of the "ariables mentioned in !( The "alue of expression ! gi"en a truth assignment T. denoted !<t=
is the result of e"aluation ! ith each "ariable x replaced by the "alue T<x= <true or false= that T assigns to x(
A truth assignments T satisfies Poolean expression ! if !<T=BA1 the truth assignment t ma*es expression !
true A Poolean expression ! is said to be satisfiable if there exists at least one truth assignment T that
satisfies !(
R,2!,8,/)1/6 SAT 1/8)*/c,
The symbols in a Poolean expression are l.].- the left and right parentheses. and symbols
representing "ariables( The satisfiability of an expression does not depend on the names of the "ariables.
only on hether to occurrences of "ariables are the same "ariables are or different "ariables( Thus. e may
assume that the "ariables are xA.xFGG. although in examples e shall continue to use "ariable names li*e
y or $. as ell as x/ s ( e shall also assume that "ariables renamed as e use the loest possible subscripts
for the "ariables (
For instance. Since there are an infinite number of symbols that could in principle. appear in a
Poolean expression. e ha"e a familiar problem o ha"ing to de"ice a code ith a fixed. finite alphabet to
represent expressions ith arbitrarily large number of "ariables( Lnly then can e tal* about SAT as a
,problem-. that is. as a language o"er a fixed alphabet consisting of the codes for those Poolean expressions
that are satisfiable( The code e shall use is as follos(
A( The symbols l .].m( <.and= are represented by themsel"es(
F( The "ariable xi is represented by the symbol x folloed by D/s and A/s that represent I in binary(
Thus. the alphabet for the SAT problem0language has only eight symbols( All instances of SAT are
strings in this fixed. finite alphabet(
!xample consider the expression x l m<y ] $= form example AD(Q our first step in coding it is to replace the
"ariables by subscripted x/s( Since there are three "ariables( >e must use xA. xF.and xH( e ha"e freedom
regarding hich of x. y . and $ is replaced by each of the xi/s and to be specific let xBxA . yBxF and $BxF(
Then the expression becomes xi l m <xF ] xH=( The code for this expression is
5A l m < x AD ] xAA=
Notice that the length of a coded Poolean expression is approximately the same as the number of
positions in the expression. counting each "ariable be current as i( the reason for the difference is a that if the
expression has m position it can ha"e L<m= "ariables. so "ariables may ta*e L<log m= symbols to code(
Thus. an expression hose length is m positions can ha"e a code as long as nBL<m log m= symbols(
2oe"er. the difference beteen m and m log m is surely limited by a polynomial( Thus. as long as
e only deal ith the issue of hether or not a problem can be sol"ed in time that is polynomial in its input
length. there is no need to distinguish beteen the length of an expression/s code and the number of
positions in the expression itself(
N-C"#23,),/,88 "> )+, SAT 2!"<3,#
>e no pro"e ,coo*/s Theorem-. the fact that SAT is NP-complete( To pro"e a problem is NP-
complete. e need first to sho that it is in NP( Then. e must sho that e"ery language in NP reduces to
the problem in #uestion( In general. e sho the second part by offering a polynomial O time reduction from
some other NP-complete problem at right no1 e don/t *no any NP-complete problems to reduce to SAT(
Thus. the only strategy a"ailable is to reduce absolutely e"ery problem in NP to Sat(
E723*1/ )+, C"#23,#,/)8 "> L*/6(*6,8 1/ N? ."!0 . 10 M*!@80
=!1), 8+"!) /"),8 "/ )+, c"#23,#,/)8 "> 3*/6(*6,8 1/ N?
The class of languages P is closed under complementation for a simple argument hy. let L be in P and let
% be a T% for L( %odify % as follos. to accept L( Introduce a ne accepting state # and ha"e the ne
T% transition to # hene"er % halts in a state that is not accepting( %a*e the former accepting states of %
be non accepting( Then the modified T% accepts L. and runs in the same amount of time that % does. ith
the possible addition of one mo"e( Thus. L is in P if L is(
It is not *non hether NP is closed under complementation( It appears not. hoe"er. and in
particular e expect that hene"er a language L is NP complete. then its complement is not in NP(
T+, C3*88 "> L*/6(*6,8 C"-N
&o-NP is the set of languages hose complements are in NP( >e obser"ed at the beginning of
section that e"ery language in P has its complement also in P. and therefore in NP( Ln the other hand. e
belie"e that none of the NP-complete problems ha"e their complements in NP. and therefore no NP-
complete problem is in co-NP( Li*eise. e belie"e the complements( 2oe"er. e should bear in mind
that. should P turn out to e#ual NP. then all three classes are actually the same(
E7*#23,: &onsider the complement of the language SAT hich is surely a member of 6&o-NP e shall
refer to this complement as 4SAT include all those that code Poolean expressions that are not satisfiable(
2oe"er. also in 4SAT are those strings that do not code "alid Poolean expressions . because surely none
of those strings that do not code "alid Poolean expressions. because surely none of those strings are in SAT(
>e belie"e that 4SAT is not in NP but there is no proof(
N-C"#23,), !"<3,#8 */9 C"-N
Let us assume that PB NP( It is still possible that the situation regarding co-NP is not exactly as
suggested by figure( Pecause e could ha"e NP and are in NP. and yet not be able to sol"e them in
deterministic polynomial time( 2oe"er. the fact that e ha"e not been able to find e"en one NP complete
problem hose complement hose complement is in NP is strong e"idence that NPB co-NP. as e pro"e in
the next theorem(
)efer the theorem NPBco-NP if and only if there is some NP-complete problem hose complement is in NP(
=!1), 8+"!) /"),8 "/ !"<3,#8 S"3A*<3, 1/ "3-/"#1*3 S2*c,?."!0 .10
M*!@80
D,8c!1<, )+, 2!"<3,# 8"3A1/6 1/ 2"3-/"#1*3 82*c,?."!0
E723*1/ 2"3-/"#1*3 82*c, "/ T(!1/6 #*c+1/,?."!0S+": )+, !,3*)1"/8+12 "> S */9 NS 2!,A1"(83-
9,>1/,9 c3*88,8?
Initially. e shall distinguish beteen the languages accepted by deterministic and nondeterministic
T%/s ith a polynomial space bound. but e shall soon see that these to classes of languages are the same(
There are complete problems P for polynomial space in the sense that all problems in this class are
reducible in polynomial time to P( Thus. if P is in P or in NP. then all languages ith polynomial space
bounded T%/s are in P or NP respecti"ely( >e shall offer one example of such a problem+ ,#uantified
Poolean formulas-(
"3-/"#1*3-S2*c, T(!1/6 M*c+1/,8
A polynomial-space-bounded Turing machine is suggested( There is some polynomial p<n= such that
hen gi"en input of length n. the T% ne"er "isits more than p<n= cells of its tape(
Define the class of languages PS to include all and only the languages that are L<m= for some
polynomial-space-bounded. deterministic Turing machine %( Also. define the class NPS<nondeterministic
polynomial space= to consist of those languages that are L<%= for some nondeterministic. polynomial-space-
bounded T% %( !"idently PS. NPS. since e"ery deterministic T% is technically nondeterministic also(
2oe"er. e shall pro"e the surprising result that PSBNPS

R,3*)1"/8+12 "> S */9 NS )" !,A1"(83- D,>1/,9 C3*88,8
To start. the relationships P PS and NP NPS should be ob"ious( The reason is that if a
T% ma*es only a polynomial number of mo"es then it uses no more than a polynomial number of cells+ in
particular. it cannot "isit more cells than one plus the number of mo"es it ma*es( Lnce e pro"e PS-NPS.
e shall see that in fact the three classes form a chain of containment+ P NP PS(
An essential property of polynomial-space-bounded T%/s is that they can ma*e only an exponential number
of mo"es before they must repeat an ID( >e need this fact to pro"e other interesting facts about PS. and also
to sho that PS contains only recursi"e languages1 i(e(. languages ith algorithms( Note that there is
nothing in the definition of PS or NPS that re#uires the T% to halt( It is possible that the T% cycles fore"er.
ithout lea"ing a polynomial si$ed region of its tape(
D,),!#1/18)1c */9 N"/9,),!#1/18)1c "3-/"#1*3 S2*c,
Since the comparison beteen P and NP seems so difficult. it is surprising that the same comparison
beteen PS and NPS is easy+ they are the same classes of languages( The proof in"ol"es simulating a
nondeterministic T% that has a polynomial space bound p<n= by a deterministic T% ith polynomial space
bound L<p
F
<n==(
The heart of the proof is a deterministic. recursi"e test for hether a NT% N can mo"e from ID I to
ID n in at most m mo"es( A DT% D systematically tries all middle ID/s j to chec* hether I can become j
in m0F mo"es. and function reach <I.n.m= that decides if I n by at most m mo"es(
Thin* of the tape of D as a stac*. here the arguments of the recursi"e calls to reach are placed(
That is. in one stac* frame D holds 7I.n.%9( A s*etch of the algorithm executed by reach(
BOOLEAN FUNCTION !,*c+.I,J,#0
ID: I,JVINT: #V
BEGIN
IF .#C C10 THEN $G <*818 G$BEGIN
T,8) 1> IC C J "! I c*/ <,c"#, J *>),! "/, #"A,V
RETURN TRUE IF SO, FALSE 1> /")V
ENDV
ELSE $G 1/9(c)1A, 2*!) G$BEGIN
FOR ,*c+ 2"881<3, ID K DO
IF .!,*c+ .I,K,#$40 AND !,*c+.K,J,#$400 THEN
RETURN TRUEV
RETURN FALSEV
ENDV
ENDV

It is important to obser"e that. although reach calls itself tice. it ma*es and therefore only one of
the calls is acti"e at a time( Ln. until at some point the third argument becomes A( At that point. reach can
apply the basis step. and needs no more recursi"e calls( It tests if IBn or I-n. returning T)4! if either holds
and FALS! if neither does( Figure suggests hat the stac* of the DT% D loo*s li*e hen there are as many
acti"e calls to reach as possible. gi"en an initial mo"e count of m(
>hile it may appear that many calls to reach are possible. and the tape of can become "ery long. e
shall sho that it cannot become ,too long-( That is. if started ith a mo"e count of m. there can only be
logF m stac* frames on the tape at any one time( Since Theorem assures us that the NT% N cannot ma*e
more than c
p<n=
mo"es. m does not ha"e to start ith a number greater than that( Thus. the number of stac*
frames is at most( Log
F
c
p<n=
hich is L<p<n==( >e no ha"e the essentials behind the proof of the folloing
theorem(
Theorem+ <Sa"ithc/s Theorem= PSBNPS(
ROOF:
It is ob"ious that PS NPS since e"ery DT% is technically a NT% as ell( Thus. e need only to
sho that NPS PS1 that is if L is accepted by some NT% N ith space bound p<n= ( for some polynomial
p<n= then L is also accepted by some DT% D ith polynomial space bound #<n= for some other polynomial
#<n=( In fact e shall sho that #<n= can be chosen to be on the order of the s#uare of p<n=(
First. e may assume by Theorem AA(H that if N accepts. it does so ithin c
A3p<n=
steps for some
constant c( 'i"en input of length n. D disco"ers hat N does ith input by repeatedly placing the triple
7Io.n.m9 on its tape and calling reach ith these arguments. here+
A( Io is the initial ID of N ith input (
F( n is any accepting ID that uses at most p<n= tape cells1 the different n/s are enumerated
systematically by D. using a scratch tape(
H( mBlogF c
A3p<n=
(
>e argued abo"e that there ill ne"er be more than log
F
m recursi"e calls that are acti"e at the same
time one ith third argument m. one ith m0F one ith m0N and so on. don to A( Thus. there are no more
than log
F
m stac* frames on the stac* and log
F
m is L<p<n==(
In binary. it re#uires B log
F
c
A3p<n=
cells. hich is L<p<n==( Thus. the entire stac* frame. consisting of to ID/s
and an integer. ta*es L<p<n== space(
Since D can ha"e L <p<n== stac* frames at most. the total amount of space used is o<p
F
<n==( This
amount of space is a polynomial if p<n= is polynomial( So e conclude that L has a DT% that is polynomial-
space bounded(
In summary. e can extend hat e *no about complexity classes to include the polynomial-space
classes( The complete diagram is shon in fig(
E/9 "> U/1) -5

You might also like