You are on page 1of 128

Models of Language Recognition:

Automata

159
6. Variations of Automata
The four types of automata, TM, LBA, PDA, and FA (either deterministic or nondeterministic) are typical models for
computation that have been extensively investigated in the area of theory of computation. Many variations of these automata have
been introduced, including automata with variations on the number of tapes or stacks, tape dimensions, number of heads, head
mobility, etc. This chapter introduces some of these interesting variations and discusses their computational capability.
By computational capability we are concerned only with a computational model’s capacity to show us a final result, with no
regard to the amount of space or time used. If two models have the same computational capability, we say they are equivalent.
Simulation is a popular technique for showing that two computational models are equivalent. If we can show that one model can
simulate the other and vice versa, then they are equivalent. We will see a couple of application examples of this technique.

6.1 Transducers: automata with output 161


Mealy automata, Moore automata
6.2 Variations of TM and LBA 162
Multi-track TM, Multi-tape TM, 2D-tape TM
6.3 Variations of PDA 168
2-way PDA, Multi-stack PDA
PDA accepting with empty stack
6.4 Variations of FA 176
2-way FA, 2D-tape FA, Multi-head FA,
FA array: cellular automata, Hidden Markov models (HMM)
6.5 Church’s hypothesis 181
Exercises 182

160
Variations of automata
6.1 Transducers: Automata with output
The automata that we have introduced produce (implicitly) a simple output “yes”
by entering an accepting state if the input string belongs to its language. Attaching
an output port to the finite state control we can extend this limited output capacity so
that for the given input string they produce an output string of finite length (see
Figure (a) below for an FA). We call such automata transducers.
Depending on the time of output, there are two variations. The one produces an
output while taking a transition, and the other after the transition (i.e., on the next
state). In particular for FA, the former models are called Mealy automata, and the
latter Moore automata. (See the figure (b) and (c), where b is an output symbol.)

b
--a-- a
a/b
output b
port

(a) Transducer (b) Mealy automaton (c ) Moore automaton

161
Variations of Automata
6.2 Variations of TM
The TM with one-dimensional tape and one read/write head can be extended
to multi-track, multi-tape, or multi-dimensional tape TM. We will see that
these variations are equivalent to the standard TM. In other words, if one of
these variations recognizes a language L, we can build a standard TM that
recognizes L, and vice versa.

Multi-track TM
A multi-track TM has a read/write tape organized into multiple (i.e., some
constant k  2) tracks. The input string is written on one of the tracks and during the
computation, all the cells in one column are read and then written simultaneously.

a b a a b a b a a b
b c

p r

162
Variations of Automata
Variations of TM
The figures below illustrate that by encoding k symbols that can appear in a
column of k-track tape, we can transform a k-track tape TM M to the standard
1-track TM M’ that recognizes the same language.
Conversely, for a given 1-track TM, it is trivial to construct a k-track TM that
recognizes the same language. Hence, 1-track TM’s and multi-track TM’s are
equivalent.

a b a a b
  a a b
b c Encoding
[a, B] = a
M r M’
r [b, B] = b
[a, b] = 
([a, B], [a, b], R) q (a, , R) q
[b, c] = 
.
p . p

163
Variations of TM Variations of Automata

Multi-tape TM
A multi-tape TM has some constant k  1 auxiliary tapes in addition to the input tape.
Each tape head can move independent of others. Given a (k+1)-tape TM M, we can
build a (k+1)-track TM M’ which simulates M as follows. Put the k+1 tracks of M’ in
correspondence with the k+1 tapes of M as illustrated below. Use a check mark for
each track to indicate the position of its corresponding tape head. To simulate the next
move of M, M’ scans its tape and collects the k+1 check marked symbols, applies the
transition function of M and then updates its tape accordingly.
Given a multi-track TM, it is trivial to construct a multi-tape TM that recognizes the
same language. It follows that both TM models are equivalent. Since multi-track TM’s
are equivalent to standard TM’s, multi-tape TM’s are equivalent to standard TM’s.

a b b a
Input tape a b b a
M
c d
auxiliary tapes
a b
c d
M’
a b
164
Variations of TM Variations of Automata

2D-tape TM
We can extend the tape dimension of the standard TM to a multi-dimension one. A
two-dimensional (2D) tape TM is illustrated below. The finite state control, with a
read/write head under it, can move up, down, to the right or to the left. This model is
widely used for pattern analysis and searching.
We can show that the tape’s dimension does not affect computational capability. For
example, given a standard TM M, it is trivial to build a 2D-tape TM M’, which, given
the input string on one of its tape rows, simply simulates M step by step.
Conversely, given a 2D-tape TM M, constructing a standard TM M’ that simulates M
is a little bit tricky. TM M’, with the 2D input mapped on its one-dimensional tape raw
by raw with a boundary mark, must keep track of the current position of M. Here we
present a rough sketch of an idea.

a b a
finite state control b b a
with 4-way read/ a
write head.

165
Variations of TM Variations of Automata

Given a 2D pattern, we first transform it into the smallest rectangle by filling the
blank cells around the boundary with a special symbol (# in the figure below). (We
need this rectangular form for the convenience of keeping track of the current
position of M when it moves up or down.)
Then we make copies of the rectangular pattern row by row and concatenate
them into a one-dimensional string by inserting a boundary marker ($ in the
figure). Finally we put this one-dimensional string onto the standard TM tape with
a check mark on every symbol appearing in the column where M is positioned.
Now, we let M’ simulate M step by step. The simulation needs tedious house-
keeping operations, especially when M moves up, down, or off the pattern. We
leave these challenging details for the reader.

a b a
  
finite state b b # a b a $ b b # $ b a #
control b a #
with 4-way
read/write
head. M’
M
166
Variations of Automata
Variations of LBA’s
We can extend the LBA model with multi-track, multi-tape, and multi-dimensional
tape LBA’s as we did for the TM. In these extensions, we restrict the additional tape
space available to the size of the input. Recall that the tape space of the standard LBA
is restricted to the length of the input. So, for the two-tape LBA, the machine cannot
use auxiliary tape longer than the input size. We can show that these LBA variations
are all equivalent to the standard LBA.

Classic Teaching Break Time

A ten year old boy was failing math. His parents tried everything from tutors to hypnosis, but to no avail. Finally,
at the insistence of a family friend, they decided to enroll their son in a private Catholic school. After the first day,
the boy's parents were surprised when he walked in after school with a stern, focused and very determined
expression on his face, and went right past them straight to his room, where he quietly closed the door.
For nearly two hours he toiled away in his room - with math books strewn about his desk and the surrounding
floor. He emerged long enough to eat, and after quickly cleaning his plate, went straight back to his room, closed
the door, and worked feverishly at his studies until bedtime. This pattern continued ceaselessly until it was time for
the first quarter report card. (To be continued on next slide)

167
Variations of Automata
6.3 Variations of PDA
The PDA model also has many variations, such as two-way (input head), multi-stack,
finite-turn, and empty stack. A finite-turn PDA can have a constant number of turns
from pushing operations to popping and vice versa during the computation. An empty
stack PDA accepts the input string by empty stack (instead of being in an accepting
state) together with the condition of reading up to the last input symbol.
Contrary to the TM case, these variations of the standard PDA show different
computational capability, and in general it is not easy to prove the difference.

Classic Teaching Break Time

The boy walked in with his report card -- unopened -- laid it on the dinner table and went straight to his room.
Cautiously, his mother opened it, and to her amazement, she saw a bright red "A" under the subject of MATH.
Overjoyed, she and her husband rushed into their son's room, thrilled at his remarkable progress.
"Was it the nuns that did it?", the father asked. The boy only shook his head and said, "No."
"Was it the one-on-one tutoring? The peer-mentoring?"
"No."
"The textbooks? The teachers? The curriculum?"
"Nope," said the son. "On that first day, when I walked in the front
door and saw that guy they nailed to the 'plus sign,' I just knew they meant business!"
- Denise -

168
Variations of PDA Variations of Automata

2-way PDA
The 1-way head of PDA can be extended to 2-way with the same two accepting
conditions. This two-way mobility gives more computational power. For example,
we can easily construct a 2-way PDA M that recognizes our familiar context-
sensitive language {aibici | i > 0 }.
To accept this language, M scans the input twice. During the first scan M reads
up to the first c and checks if the prefix before c is aibi, for some i > 0. If this part
is successful, M moves back to the left till it sees the last a. Then scanning right,
M makes sure that there is the postfix bici. (We leave the details for the reader.) In
Chapter 12 we will prove that this language is not a context-free and that no
standard PDA can recognize it.

aa . . . abb . . . bcc . . . c

2-way read only

Z0 stack

169
Variations of PDA Variations of Automata

Multi-stack PDA
Given one extra stack to the standard PDA, the computational capability of the
machine jumps to that of TM. In other words, given a TM, we can construct a 2-
stack PDA which can simulate the TM. Here we will sketch how.
Before the simulation begins, the PDA reads the whole input and pushes it into
one of its stacks such that the leftmost symbol under the TM head appears at the
stack top as figure (b) below shows.
abbab . . . aab$

abbab . . . aab 1-way read only

2-way read/write
Stack 1 Z0

(a) TM
Stack 2 Z0baa . . .babba

(b) 2-stack PDA

170
Variations of PDA Variations of Automata

The 2-stack PDA can do this by pushing the whole input into one of its stacks and
then transferring it to the other stack. Notice that since the input has no end marker, to
shift the whole input string into one of the stacks, the PDA needs a nondeterministic
move to guess the last symbol of the input.
During the simulation we can always let the symbol read by the TM appear at one
of the stack tops (stack 2 in figure (b) below) by popping a symbol from a stack and
pushing it onto the other stack. Using this trick, it is easy to simulate the TM step by
step. (The details are left to the reader.)
abbab . . . aab

abbab . . . aab
1-way read only
2-way read/write
Z0abba Stack 1
(a) TM
Z0baa . . .b Stack 2

(b) 2-stack PDA


171
Variations of PDA Variations of Automata

PDA’s accepting by empty stack


Recall that for a PDA to accept the input string, it must satisfy two conditions:
(1) the whole input must be read and (2) the machine must be in an accepting state.
Substituting the empty stack for condition (2), we get a PDA accepting by empty stack.
Here, by empty stack we mean the stack completely emptied by popping out the bottom
of the stack symbol Z0. No computational move is possible with empty stack.
We can show that the standard PDA and the PDA accepting by empty stack are
computationally equivalent. Let Ma be a standard PDA and Me be a PDA accepting by
empty stack. Given an Me, we can construct an Ma which recognizes the language
L(Me). We simply change every transition of Me that pops Z0 to a transition that enters
an accepting state without empting the stack, as the following figure show.

(a, Z0/) (a, Z0/Z0)

(, Z0/) (, Z0/Z0)

(a) Me (b) Ma
172
Variations of PDA Variations of Automata

Now suppose that we are given a standard PDA Ma, and assume that Ma does not
pop the bottom of stack symbol Z0. Otherwise, if Ma has a transition, for example
(p, a, Z0)= (q, ), we introduce a new stack symbol, say A0, and let the machine
rewrite Z0 with A0, instead of popping (see the figure below). Such modification
does not affect the language accepted by Ma. Notice that Ma does not empty the
stack, because no transition is defined with A0 at the stack top.

(a, Z0/) (a, Z0 /A0 )

(, Z0/) (, Z0/A0 )


q0 q0
start start

173
Variations of PDA Variations of Automata

With Ma, which does not pop the bottom of the stack symbol Z0, we construct an Me
to recognize L(Ma) as follows. Introduce a new state, say qe, and add a transition to this
state from every accepting state of Ma without reading the input. Then let it dump the
stack contents in state qe. (In the figure below, X denotes any stack symbol including
Z0.)
(, X/)
(, X/) qe
(, X/)

(a) Ma (b) Me

174
Variations of PDA Variations of Automata

We learned that substituting the condition of empty-stack for the condition of being
in an accepting state does not affect the language accepted by the PDA, i.e., the classes
of languages accepted by both models are the same. In the proof, we paid no attention
to the types (deterministic or nondeterministic) of PDA Ma and Me. If the given Me is
DPDA (or NPDA), then obviously the PDA Ma that we have constructed is also DPDA
(or, respectively, NPDA). But it does not work the other way around. Here is an
example.
It is easy to construct a standard DPDA which recognizes the language is a+ (see
figure (a) below). If we transform this standard DPDA to a DPDA accepting by empty
stack, we get an NPDA Me as shown in figure (b) below. Actually, it is impossible to
construct a DPDA that recognizes this language with empty stack. So we can say that
the standard DPDA is more powerful than the DPDA accepting by empty stack.
( a, Z0/Z0 ) ( a, Z0/Z0 )

( a, Z0/Z0 ) ( a, Z0/Z0 ) ( a, Z0/ )

(a) DPDA Ma (b) NPDA Me


175
Variations of Automata
6.4 Variations of FA
The FA also has many variations. In this section we will introduce 2-way FA,
multi-head FA, 2-D tape FA and the hidden Markov model (HMM).
2-way FA
It is interesting to see that unlike the 2-way PDA, 2-way FA is computationally
equivalent to the standard 1-way FA. This implies that for FA, the mobility of the
head does not give any additional capability for recognizing the language. Actually
there is an algorithm that, given a 2-way FA, constructs 1-way FA recognizing the
same language. The algorithm is given in Appendix B with an example.

a b a b b

2-way read only

176
Variations of FA Variations of Automata

2-dimensional (2D)-tape FA
A 2D-tape FA model is an FA with a read-only 2D-tape. As in the 2D-tape TM
model, the finite state control can move up, down, to the left or to the right.
However, it cannot write. This model is widely used in the area of pattern
recognition. Sometimes the tape size is restricted to some constant size.

2D tape
a b a (read-only)

b a
Finite state control b a

177
Variations of FA Variations of Automata

Multi-head FA
We can add extra read-only heads to the standard FA model. Initially all the
heads are assumed to be reading the leftmost input symbol. The head mobility is
usually restricted to one-way, and we may (or may not) allow the heads to sense
the presence of the others on the same cell.
A multi-head FA model is more powerful than the standard FA. We can constrict a
2-head FA that recognizes {aibi | i > 0 }, which is a non-regular context-free
language. (In Chapter 12 we will prove that this language is not regular.) (The
details for the construction are left for the reader.)

a a a b b b

2-head read only

We can construct a 1-way 2-head NFA that recognizes the context-sensitive


language {ww$ | w  {a, b}+ }, which is not context-free. However, this fact does
not mean that multi-head FA outperforms PDA. For example, it is shown that no
1-way multi-head FA can recognize the context-free language {xxR$ | x  {a,
b}+ }.
178
Variations of FA Variations of Automata
FA array: cellular automata
A cellular automaton (CA) is a regular array of FA’s, called cells, connected by one-
way or two-way communication lines between neighboring cells. All the cells operate
in synchrony. The CA is a typical fine grained parallel computer model that has been
extensively investigated and implemented.
An input string is given to a dedicated cell to read in sequentially, or in parallel to all
the boundary cells. Interested readers are referred to “Theory and Applications of
Cellular Automata” (by Stephen Wolfram, 1986, World Scientific) or “Cellular
automata: Theory and Experiment” (Edited by Howard Gutowitz, 1991, MIT Press).

.....

..... .....

(a) 1D cellular automaton .....


. . . . . . . . .
.....

(b) 2D cellular automaton


179
Variations of FA Variations of Automata

Hidden Markov Models (HMM)


A Markov model (MM) is a probabilistic model widely used in the areas of
pattern matching and image recognition. The MM is a variation of Moore automata
with a transition probability given on each edge (instead of an input symbol) and
an output probability given to each output symbol (see the figure below). In
general, no start state is defined and every state can be the start state with a given
probability.
An MM for a given system can be constructed with a probability that can be
computed with a sequence of state transitions and the outputs collected by
observing the system. HMM is the case when only the output sequence is available
and the state transitions are hidden and therefore not available.
a/0.1
0.3 b/0.9

a/0.5
b/0.5
180
Variations of Automata

6.5 Church’s Hypothesis


In this chapter we learned that there are many variations of the four types of
automata. Some of them (e.g., multi-tape TM) are equivalent to the standard model,
and others (e.g., 2-way PDA and FA) show their computational capability upgraded.
With regards to these variations and other possible computational models, we may
ask the following questions:
[ Is there any upper limit on computational capability? In particular,
is there any computational model that is more powerful than TM? ]
To these questions Church proposed a thesis stating that all possible models of
computation, if they are sufficiently broad, must be equivalent to TM model. This is
called Church’s hypothesis or the Church-Turing thesis. This hypothesis implies that
there is an upper limit in the computational capability, which is TM’s computability.
Every computational model that has been introduced turned out having no more
power than the TM model, supporting the hypothesis that no such model exists.

181
Variations of Automata
Exercise

6.1 Construct the state transition graph of a DPDA which recognizes each of the following languages by empty stack.
(a) L1 = {aibj; | i > j > 0 } (b) L2 = {aibj | i > j > 0 } (c) L3 = { aibkci | i, k > 0 }

6.2 Construct the transition graph of a 2-head 1-way DFA which recognizes the language L below and briefly describe how
it works. The two heads can move independently to the left, right or stay, after reading the input. When you construct a 2-
head 1-way DFA, use the notation shown in the figure below. (In Chapter 12 we will prove that it is impossible for any
standard FA to recognize L.)
L = { aibi | i > 0 }

In state p, if head 1 reads a and head 2 reads b, or


both heads read a, then the 2-head DFA enters state q.

( a, b ), (a, a )
q
p
t
In state t, head 2 does not read, and if head
s 1 reads a, then the 2-head DFA enters state s.
(a,  )

182
Exercises Variations of Automata

6.3 Construct the state transition graph of a 2-D DTM, which draws a figure 8 with character 8’s (see figure (b)) with as few
states as possible. When the work is complete your DTM should be in an accepting state.

Finite state control

8 8 8
8 8
8 8 8
8 8
8 8 8

(a) (b)

English Professor Break Time

An English professor wrote the words, “a woman without her man is nothing” on the blackboard and directed the
students to punctuate it correctly.

The men wrote: “A woman, without her man, is nothing.”


The women wrote: “ A woman: without her, man is nothing.”
- Mike & Corinne -

183
Hierarchy of the Models

184
7. Hierarchy of the Models: Characterization
We have studied four classes of formal languages and their grammars, four classes of automata and their variations, and other
models for representing or generating languages such as regular expressions, syntax flow graphs and L-systems. In this chapter
we will study the relations, called the Chomsky hierarchy, between these models. The Chomsky hierarchy reveals two important
relationships, characterizations and containments, among the four classes of languages and automata.
The characterizations show the relations between the models for the language generation (i.e., grammars) and those for
language recognition (i.e., automata) or expression. For example, a language L can be generated by a regular grammar if and only
if it is recognizable by an FA, and a language L is recognizable by an FA if and only if it is expressible by a regular expression.
The containments show the set relations among the classes of languages generated (recognized) by the four types of grammars
(respectively, automata). For example, the class of languages generated (recognized) by context-free grammars (respectively, by
PDA) properly contains the class of languages generated (recognized) by regular grammars (respectively, FA). The same set
relation holds among the classes of languages generated (recognized) by type 0, 1, and 2 grammars (respectively, TM, LBA, and
PDA).
In terms of computational capability, this containment relation between the four classes of languages recognized by TM, LBA,
PDA, and FA implies that anything computable by an LBA is also computable by TM, but not necessarily the other way around,
and anything computable by a PDA is also computable by LBA, and so on. Therefore the Chomsky hierarchy provides valuable
information for designing an efficient computational model, as well as for analyzing the computational capability of a given
model. This chapter proves the characterization of the models at the lowest level of the hierarchy, i.e., regular grammars, FA, and
regular expressions. Laying the groundwork through the following chapters, we will complete the proof of the hierarchy in
Chapters 12 and 15.

185
Hierarchy

7.1 Chomsky Hierarchy 187


7.2 Proof of Characterization 191
Constructing an FA from a regular grammar G that recognizes L(G).
Constructing a regular grammar from an FA M that produces L(M).
Constructing an FA from a regular expression R that recognizes L(R).
Constructing a regular expression from an FA M that expresses L(M).
Rumination 215
Exercises 216

Today’s Quote Break Time

You’re alive. Do something. The directive in life, the moral imperative was so uncomplicated. It could be expressed
in single words, not complete sentences. It sounded like this: Look. Choose. Act.
- Barbara Hall -

186
Hierarchy
7. 1 Chomsky Hierarchy
The Chomsky hierarchy shows the characterization relations between two
different types of models as illustrated below. For example, L is a language
generated by type 0 grammar if and only if it is recognized by a TM. In this chapter
we will only prove the characterization relations between regular grammars, FA’s
and regular expressions. We defer the proofs for other characterizations till Chapter
15. These proofs are challenging, and usually included in a graduate course.
.

Type 0 TM CSL LBA CFL PDA

Regular FA Regular
exp

187
Hierarchy
Chomsky Hierarchy

For i  {0, 1, 2, 3}, let TYPE-iL be the class of languages generated by type i
grammars. The Chomsky hierarchy shows the proper containment relation among the
four classes of languages as illustrated by the Venn diagram below. The class of
regular languages is properly contained in the class of context-free languages. The
class of context-free languages is properly contained in the class of context-sensitive
languages which is in turn contained in the class of type 0 (phrase-structured)
languages. In Chapter 12, we will see an elegant proof of these containment relations.

Type-0L Type-3L
Type-1L Type-2L

188
Chomsky Hierarchy Hierarchy

The characterization and containment relations of the four classes of languages


imply the hierarchy in the computational capability of the four types of automata, as
illustrated in the figure below. Every language recognized by an FA can be
recognized by a PDA, and every language recognized by a PDA can be recognized
by an LBA, and so on.

TM
LBA

PDA
FA

The figure in the following page shows the summary of these relations, called the
Chomsky hierarchy (named after Noam Chomsky, who defined the four classes of
languages), among the models that we have studied, as well as some interesting
models investigated by researchers. This hierarchy is a beautiful piece of knowledge
that computer scientists have gained through the advancement of the field.

189
Hierarchy
Chomsky Hierarchy
Languages (grammars) Automata Other Models

Recursively Post System,


Enumerable Turing Markov Algorithms,
Sets (type 0) Machines -recursive Functions

Context-sensitive Linear-bounded .
Languages(type 1) Automata .
.

Context-free Pushdown  : Containment


Languages(type 2) Automata  : Characterization

Regular Finite State Regular


Languages(type3) Automata Expression

190
Hierarchy
7.2 Proof of the Characterization
The theorem below states the characterization relations between two models at the
lowest level of the Chomsky hierarchy, which is illustrated by the following figure.
We will prove this theorem.

Regular Regular
languages FA expression

Theorem. Let L be a language.


(1.1) If L can be generated by a regular grammar, then there is an FA that
recognizes L. (This implication will be denoted by RG  FA.)
(1.2) If L is recognizable by an FA, then L can be generated by a regular
grammar (FA  RG).
(2.1) If L is expressible by a regular expression, then there is an FA that
recognizes L (Re  FA).
(2.2) If L is recognizable by an FA, then L can be expressible by a regular
expression (FA  Re).

191
Hierarchy
Proof (1.1): RG  FA
Let G be a regular grammar. For the proof, we will present how to construct an FA
to recognize L(G). Suppose that A and B are two arbitrary nonterminal symbols, and
a and b are terminal symbols. The following figure shows how to transform typical
production rules that will appear in a regular grammar into the state transitions of an
FA which recognizes the language generated by the grammar. (Notice that heavy
circle denote an accepting state.)
Rules State transitions

a b
A  abB | B A B

Let the state with label A be an
A accepting state.

S is the start symbol Let the state with label S be the


start S
start state.

Aa A
a Put a transition from state A to
a new accepting state.
192
Proof (1.1): RG  FA Hierarchy

Example. Transforming a regular grammar to an FA


The following example would be enough to illustrate how to transform an arbitrary
regular grammar into an FA which recognizes the language generated by the grammar.
start c a c
S C
S  abcS | cC A  aS | B c a
a a
B  aA | a |  C  aB | ac 
a
A B
b
a
For the proof, let  be the set of terminal symbols of G. We must show that for every
string x  *, x  L(G) if and only if x is accepted by the FA that we have constructed
by the above idea. Suppose that string x is derivable by the grammar as follows.
S => w1V1 => w1w2V2 => . . . => w1w2 . . .wn-1Vn-1 => w1w2. . .wn-1wn = x,
where Vi and wi are, respectively, a nonterminal symbol and a terminal string (or )
such that Vi → wi+1Vi+1 is a rule in G. We can prove that that the string x is accepted by
the FA, and the converse is also true. (We leave the detailed proof for the reader.)

193
Hierarchy
Proof (1.2): FA  RG
Given an arbitrary FA, we transform the FA into a regular grammar as follows.
First, label the start state with the designated symbol S and others state with an
arbitrary (distinguishable) nonterminal symbol. Then transform each transition into
a rule as shown below.

State transition Production rule

a
b, c A  bB | cB | aA
A B

S
start Let S be the start symbol.

A A

194
Proof (1.2): FA  RG Hierarchy

Example: Transforming an FA into a regular grammar

a c a c
a a A C E
a a
b a b a
S
start b start b B A
a D
a N
b Label the states
b
I
Transform each transition
into a rule
G: S  aS | aA A  bB B  bB | bS | aD | 
D  aC C  aB | cE E

To complete the proof, we need to show that for every string x, the FA accepts x
if and only if the grammar G generates it. (The detailed proof is left for the reader.)

195
Hierarchy
Proof (2.1): Re  FA
Let R be a regular expression which expresses a regular language L. We construct
an FA which recognizes L. Let  be the alphabet of L. Going along the inductive
definition of regular expressions, we will show how to construct an FA recognizes the
language expressed by R.
(1) If R is a regular expression which is either , , or a, for a symbol a  , which,
respectively, express the language  (the empty set), {}, or {a}, we construct an
FA which recognizes the respective language as follows:

  a
a
start start start

(2) Suppose that we have constructed two FA M1 and M2 , which recognize the
languages expressed by regular expressions r1 and r2, respectively:

L(M1) = L( r1) L(M2) = L(r2)

196
Hierarchy
Proof (2.1): Re  FA

(3) Using M1 and M2 , we construct FA M1+2, M12, and M1*, which, respectively,
recognize the language expressed by the regular expressions r1+ r2, r1r2, and (r1)*as
follows.
To construct the FA M1+2, introduce a new start state and link it to the start state
of M1 and M2 , as the following figure illustrates. Clearly, L(M1+2) = L( r1+ r2).

start L(M1) = L( r1)


new  M1 L(M2) = L(r2)
start
 M2
L(M1+2) = L( r1+ r2)
(a) M1+2 start

197
Hierarchy
Proof (2.1): Re  FA

To construct the FA M12, we first convert all accepting states in M1 to non-accepting


states. Then from each of these non-accepting states, we set up an -transition to the
start state of M2 . Clearly, L(M12 ) = L(r1r2).
L(M1) = L( r1)
L(M2) = L(r2)
start  start
M1 M2

L(M12 ) = L(r1r2)
(b) M12

Assumptions Break Time

A car was involved in an accident. As one might expect, a large crowd gathered. A newspaper reporter, anxious to
get is story, pushed and struggled to get near the car. Being a clever sort, he started shouting loudly, “Let me
through! Let me through please! I am the son of the victim.” The crowd made the way for him. Lying in front of
the car was a donkey.
- Anonymous -

198
Hierarchy
Proof (2.1): Re  FA

Now, we construct an FA M1* to recognize the language expressed by r1*. First


introduce a new accepting start state, and then set up an -transition, respectively,
from this start state to the old start state and from every accepting state to the new
start state (see the illustration below). Notice that we need the new start state defined
as an accepting state to let the FA accept  which is in the language expressed by r1*.


L(M1) = L( r1)
new start
 start
start M1

(c) M1* L(M1*) = L(r1*)

199
Hierarchy
Proof (2.1): Re  FA

Notice that when we construct M1* , if we use the old start state as illustrated in
figure (a) below without introducing the new one, FA M1* may accept a string not in
r1*. To see why, consider an FA whose start state is in a cycle as shown in figure (b)
below. Since string ab is not accepted by M1, it should not be in r1*. However, the FA
in figure (c) shows that the FA accepts ab.


b b 
start a
M1 a
b a b a
start start start
(a) (b) (c)

200
Proof(2.1): Re  FA Hierarchy
Example: Constructing an FA for a given regular expression.
Based on the approach given above, we show a step-wise construction of an FA
which recognizes the language expressed by the regular expression ((ab + )ba)*.
a b ab
a b  a  b
start start start start

ab +  a  b ba b  a A
 start N
start I
 a  b  b  a

(ab + )ba start 

((ab + )ba)* 
a  b  b  a

start  

201
Hierarchy
Proof (2.2): FA  Re
Let M be an FA. We will show a method systematically transforming the state
transition graph of M to a regular expression which expresses L(M).
We first transform all the edge labels (i.e., the input symbols) in the transition
graph into a regular expression (see figure (b) below). Now, we interpret the
transition graph as follows: If there is an edge from a state p to a state q labeled
with a regular expression r, then it implies that M, in state p reading any string in
L(r), enters state q.
By extending the function , we let it denote the above observation by (p, r) = q.
Clearly, labeling the edges with regular expressions this way does not affect the
language accepted by M.
a, b a+b
a a, b, c a a+b+c
2 2
start 1 5 start 1 5
b 4 a b 4 a
3 3
 
(a) (b)
202
Proof (2.2): FA  Re Hierarchy

Now, let G be the state transition graph with edges labeled with a regular expression.
We eliminate any state (except for the start state and the accepting states) from G and
manipulate the edges and their labels without affecting the language recognize by the
automaton. The following example shows how.
a+b
Eliminate state 2
a(a+b)*(a+b+c)
a a+b+c
2 1 5
1 5
b a b 4 a A
3 4 3
a Eliminate a N
states 3 and 4 I
Merge edges a(a+b)*(a+b+c)
a(a+b)*(a+b+c)+baa
1 5 1 5

baa
Clearly, the label a(a+b)*(a+b+c)+baa on the edge from the start state to accepting
state 5 is a regular expression which denotes the set of strings accepted by state 5 of
the automaton.

203
Proof (2.2): FA  Re Hierarchy

The following figures show a typical case of eliminating a state from a state
transition graph and manipulating the edge labels with a regular expression.
Clearly, the same idea works even when a complex regular expression is
substituted for each simple regular expression in the graph.

f df*c
af*b
a c af*c
... r s ...
q ... r s ...
b d
df*b

If the state transition graph has k ( 1) accepting states, the language L(M) is the
union of all the languages accepted by these accepting states. Suppose that M has k
accepting states and for each i, we found a regular expression ri that denotes the
language accepted by i-th accepting state, then the regular expression r denoting the
language accepted by M is given as follows.
r = r1 + r2 + . . . . + r k

204
Proof (2.2): FA  Re Hierarchy

Example: We compute a regular expression which denotes the language


accepted by the FA shown in figure (a) below.
We will compute regular expressions r4 and r0 that denote the language accepted
by the accepting states 4 and 0, respectively, as they are illustrated by figures (b) and
(c) below, and then find the answer r = r4 + r0.

r4
a 0 4
b
a 2
b 1 (b)
a 
0 b  r = r 0 + r4
start
b 4 b
a r0
3 0
b
(a)
(c)
205
Proof (2.2): FA  Re Hierarchy

To compute r4, we change state 0, the start state, to non-accepting state, and
leaving the start state and the accepting state 4, eliminating all the other states one
by one.
We begin by eliminating state 2. The order of elimination does not matter.
Although the regular expression may differ, it expresses the same language. It is
more convenient to eliminate a state that involves fewer transition edges.
A
a Eliminate state 2 a ba N
b I
a 2 a b
1 b 1
b
a  ba
start 0 b  start 0 b 
4 b b 4 b
b
a a
3 3
b b

206
Proof (2.2): FA  Re Hierarchy

Parallel transition edges (i.e., edges having the same origin and destination) are
merged into one edge with all the labels merged into one using the operator +. In the
figures below, notice how the two transitions from state 1 to state 4 and the two loop
transitions on state 1 are, respectively, merged into one.

a+ba
a ba Merge parallel edges
a a b+
1 b b 1
b
ba b ba
0
start
0 b  start

b 4 b 4 b
b
a a
3 3
b b

207
Proof (2.2): FA  Re Hierarchy

a+ba Eliminate state 3 a+ba

a b+ a
b 1 1 b+
b
b ba bba
start
0 0 b
start
b 4 b 4 b
a ba
3
b
a+ba bb

a Merge parallel
1 b+
b edges
A
b bba
start
0 b+bb N
4
I
ba

208
Proof (2.2): FA  Re Hierarchy

a+ba

a A
1 b+ N
b
bba I
0 b b+bb
start Eliminate state 1
4
ba

a(a+ba)*b a(a+ba)*(b+)
b+bb
b
bba(a+ba)*b
0 4
start

ba
bba(a+ba)*(b+)

209
Proof (2.2): FA  Re Hierarchy

a(a+ba)*b a(a+ba)*(b+)
b+bb A
b N
bba(a+ba)*b
start
0 4 I

ba Merge parallel edges


bba(a+ba)*(b+)

a(ba+a)*b+b a(a+ba)*(b+)

0 4
start
bba(a+ba)*b+ba

bba(ba+a)*(b+)+(b+bb)

210
Proof (2.2): FA  Re Hierarchy
In general, with all the states eliminated except for the start state and one accepting
state, we get a transition graph, as shown in figure (i) below. We need one more step.
To enter the accepting state, the machine must take a transition from the start state
to the accepting state by (r11)*r12 . Once in the accepting state, we see the machine has
two self-loops as shown in figure (ii), the one with label r22 and the other with label
r21(r11)*r12, passing through state 1 and returning back to state 2. We finally merge
them into one as shown in figure (iii), and read off the regular expression which
denotes the language accepted by the accepting state 2 as shown in figure (iv).
r11 r21(r11)*r12 r22
r22
r12
(r11)*r12
start 1 2
1 2
r21 A
(i) r22 + r21(r11)*r12 N
(ii)
I
(r11)*r12 r2 = (r11)*r12 (r22 + r21(r11)*r12)*
1 2
(iii) (iv)
211
Proof (2.2): FA  Re Hierarchy

Now, we go back to figure (g) (repeated below) in the series of state eliminations,
and we apply the idea we have just developed. Using the short notation rij for the
label on the edge linking state i to j, the language accepted by state 4 can be
expressed by the regular expression r4 shown below.

r00 r04
a(ba+a)*b+b a(a+ba)*(b+)
r4 = (r00)*r04( r44 + r40(r00)*r04 )*

0 4
start
bba(a+ba)*b+ba
r40
bba(ba+a)*(b+)+(b+bb)
r44

212
Proof (2.2): FA  Re Hierarchy

To compute a regular expression r0 which denotes the language accepted by state


0 (the start state) of the FA, we can use the same reduced graph (repeated below). To
compute a regular expression that denotes the language accepted by the start state,
we change the start state back to accepting state and state 4 to non-accepting state as
show below.

a(ba+a)*b+b a(a+ba)*(b+) a(ba+a)*b+b a(a+ba)*(b+)

0 4 0 4
start bba(a+ba)*b+ba start bba(a+ba)*b+ba

bba(ba+a)*(b+)+(b+bb) bba(ba+a)*(b+)+(b+bb)

213
Proof (2.2): FA  Re Hierarchy

By eliminating state 4 and merging the resulting two self-loops on state 0 as


shown, we finally get a regular expression r0. (Notice that we are using the short
notation for convenience.)

r00 r04 r00


a(ba+a)*b+b a(a+ba)*(b+) r04(r44)*r40

start 0
0 4 A
start bba(a+ba)*b+ba (i) N
r40 I
(h) bba(ba+a)*(b+)+(b+bb)
r00 + r04(r44)*r40
r44

r0 = (r00 + r04(r44)*r40)* start 0


(k)
Finally, substituting back the entire short notation in r0 + r4, we will get a regular
expression r which denotes the language recognized by the given FA.

214
Hierarchy
Rumination (1): FA  Re
• The state elimination technique would be easy to practice for a simple graph with a paper-and-pencil. However, it will be messy
for a large graph. There is a beautiful algorithmic approach, called CYK algorithm, developed based on the dynamic programming.
This algorithm is presented in Appendix C.
• Depending on the order of the states eliminated in the procedure, we will get a different regular expression. However, they are
equivalent, i.e., they denote the same language recognized by the FA.
• It is an intractable (i.e., solvable, but no practical algorithm available) problem to tell whether two arbitrary regular expressions
are equivalent or not. However, for some particular cases, especially if the expressions are simple, we can solve the problem using
the techniques available in this text. The figures below prove the two equivalences (1) and (2) below. Notice that the same
equivalence holds even when an arbitrary regular expression is substituted for a or b in the expressions. (In the following chapter,
we will learn how to convert an NFA to a DFA and a method to minimize the number of states of a DFA.)
A
(1) a* = (a* )* (2) (a + b)* = (a*b*)*
a
N
a a
a I
(a* )*  2 a a a
2 {1,2} a*

1  1 a 1
Eliminate Convert to Regular
Convert Minimize the
to an FA -transitions a DFA
number of states expression
b a, b
a, b a, b
a
(a*b*)*  2 a,b (a + b)*
a, b 2 a, b
{1,2}
1  1 a, b
1

215
Hierarchy
Exercises
7.1 Using the technique presented in Section 7.2, transform the following state transition graph of an FA into a
regular grammar which generates the same language recognized by the FA.
b
a
b
a,b 
b


start

a,b

7.2 Using the technique presented in Section 7.2, transform the following regular grammar into the state transition graph
of an FA which recognizes the same language generated by the grammar.
S  abS | cC A  aS | B | a B  aA |  C  aB | abc

7.3 Compute a regular expression that denotes the language recognized by the FA shown in problem 7.1 above. You should
also show the procedure that you took to get your answer.

7.4 Let L be the language denoted by the following regular expression. (a) Construct the state transition graph of an FA
that recognizes L, and (b) Construct a regular grammar that generates L. You should also show the procedure that you took
to find each answer.
((ba + b)* + (cd(a + b))*)bba

216
Models of Language Recognition:
Automata

217
8. Manipulating FA
In the preceding chapter, when we proved the characterizations among the class of regular languages, FA and regular
expressions, we used NFA. If we convert a regular grammar or a regular expression to an FA, the automaton usually turns
out to be an NFA. NFA is a conceptual model that cannot be implemented with any currently available technology.
However, since every NFA can be converted to a DFA, NFA’s are used as convenient tools for solving problems
concerning regular languages or designing DFA’s, which are directly realizable. Figures (a) and (b) below show,
respectively, an NFA and a DFA which recognize the same language { xa | x  {a, b}+ }. In this chapter, we will learn
how to convert an NFA to a DFA accepting the same language.

a
a, b
b
a
a

b
(a) NFA (b) DFA

When we designed automata, we did not pay much attention on the size of the automata. However, when designing an
automaton as a model for implementation, we must take account of the size because it may directly affect the size or the
complexity of the implemented hardware or software.

218
Manipulating FA

Figures (a) and (b), respectively, show pairs of DFA and DPDA (of different size), respectively, accepting the languages a* and
{aibi| i > 0}. Given an arbitrary deterministic automaton (not to mention a nondeterministic one), it is very difficult problem to
reduce the number of states of the automaton. No practical algorithms are available except for DFA. For TM and LBA the
problem is unsolvable, and it remains as an open problem for PDA’s. We will shortly learn how to minimize a DFA.

a (b, a/ )
L = {aibi | i > 0} (b, a/ )
a a
(a, a/aa)
start
(, Z0/Z0) (a, Z0/aZ0),
(b, a/ ) (a, a/aa)
a L = a* (, Z0/Z0)
(a, Z0/aZ0) (b, a/ )

start
start start

(a) Equivalent DFA (b) Equivalent DPDA

219
Manipulating FA

8.1 Converting an NFA to DFA 221


8.2 State Minimization 224
Rumination 231
Exercises 236

Family Gift Break Time

Amanpreet heard a rumor that his father, grandfather and great-grandfather had all walked on water on their
21st birthdays. So, on his 21st birthday, Amanpreet and his good friend Brian headed out to the lake. "If they did
it, I can too!" he insisted. When Amanpreet and Brian arrived at the lake, they rented a boat and began paddling.
When the got to the middle of the lake, Amanpreet stepped off of the side of the boat . . . and nearly drowned.
Furious and somewhat shamed, he and Brian headed for home.
When Amanpreet arrived back at the family farm, he asked his grandmother for an explanation. "Grandma, why
can I not walk on water like my father, and his father, and his father before him?“ The feeble old grandmother
took Amanpreet by the hands, looked into his eyes, and explained, "That's because your father, grandfather, and
great-grandfather were born in January . . . you were born in July, dear.“
- Raman -

220
Manipulating FA
8.1 Converting NFA to DFA
In this section we will study how to convert an NFA M = (Q, , , q0, F) to a DFA
M’ = (Q’, , ’, q0, F’) which recognizes the same language.
Let t  and A, B  Q. Extend  such that (A, t) gives the set of next states that M
enters, after reading the symbol t in every state in A. To convert M to M’, first let q0 be
the start state of M’ and compute the set (q0, t) = A, for every t . Then in M’ create
a new state labeled with A and add the transition ’(q0, t) = A (see figure (b)).
For each new state A created, compute (A, t) = B, for each t , create a state
labeled B, if A  B, and add the transition ’(A, a) = B, and so on. This procedure is
repeated until no new state is added to M’. This idea is best understood by the
following example.
a b A
a b c
N
a b,c a {0,1} b I
0 1 2 {1,2}
0 c c
start b,c start
b, c {2}
c
(a) NFA M (b) DFA M’
221
Converting NFA to DFA Manipulating FA

Now, finally, we define accepting states in M’. Suppose that A  Q is defined


as a state of M’. This implies that for a string x, if ’(q0, x) = A, then reading
string x, the NFA M simultaneously enters into all states in A. If there is an
accepting state i in A, then by definition, x is accepted by M ignoring all non-
accepting cases. Thus, to make M’ accept the same language, we must define A as
an accepting state, as figure (b) below illustrates. If A contains no accepting state,
then state A in M’ remains as a non-accepting state.
j
i ...
k
x {. . i, j, . .k.}
x
start start

(a) Nondeterministic transition in M (b) Deterministic transition in M’

222
Converting NFA to DFA Manipulating FA

In the example, since state 2 is the only accepting state of M, in M’ we define


every state labeled with a state set containing 2 as an accepting state. It is easy to
prove that L(M) = L(M’). (For the proof, it is enough to show that for every string
x  *, x is accepted by M if and only if x is accepted by M’. We leave the proof
for the reader.)
a b c Convert to deterministic a b
transitions
a b,c
0 1 2 a {0,1} b {1,2}
start 0 c c
b,c
start b,c
(a) NFA a b {2}
c
b
a {0,1} {1,2} Define accepting states A
0 c c N
(b)
start I
b, c {2}
(c) DFA c

223
Manipulating FA
8.2 State Minimization
This section presents a method for minimizing the number of states of a DFA. For
the method we need the following definition. Let M = (Q, , , q0, F) be a DFA. Two
states p, q Q are equivalent if they satisfy the following condition.
For every w *, (p, w) enters an accepting state if and only if (q, w) does.
The state minimization technique searches all equivalent states and merges them
into one state. To see how the technique works, consider the following DFA in figure
(a). Notice that for every input string w  {a, b}*, (3, w) enters an accepting state
if and only if (4, w) does. Hence, we can merge states 3 and 4 without affecting the
language recognized by the DFA, as shown in figure (b).
b a a
b
a a a
1 2 3 4 a a
1 2 [3,4]
b
b
b
(a)
(b)

224
State Minimization Manipulating FA

Clearly, states 2 and [3, 4] in figure (b) are equivalent and hence, we can merge
them as shown in figure (c) below. Notice that we cannot merge the two states in
figure (c), because one is an accepting state and the other is not. They are not
equivalent, because, for w = , ([2, 3, 4], w) = [2, 3, 4], which is an accepting state,
while (1, w) = 1 is not. In other words, accepting states and non-accepting states
cannot be equivalent. Now, with this conceptual foundation, we are ready to study a
systematic approach to minimizing a DFA.
a a
b b

a a a a a
1 2 3 4 1 2 [3,4]
b b
b
(a) (b)
a

a
1 [2,3,4]
b
(c)
225
State Minimization Manipulating FA

There are several algorithms available for finding equivalent states of a DFA to
merge and minimize the number of states. We will study an algorithm, called the state
partitioning technique, which is easy to follow with an example.
Example 1. Consider the DFA M in figure (a) below. We iteratively partition the
states of M into groups such that no two states belonging to different groups are
equivalent. Since accepting states and non-accepting states cannot be equivalent, we
first partition the states into two groups P1 and P2 , the one accepting and the other
non-accepting, as shown in figure (b).
a
b
3
a a
1 a P1 P2

0 b 4
start a { 3, 4, 5 } { 0, 1, 2 }
2 b b
b 5
b a
(a) DFA M (b) Initial partitioning

226
State Minimization Manipulating FA

Now, we analyze the responses of each state for every input symbol in the
alphabet {a, b} as follows. For each state q and a symbol t  {a, b}, if (q, t) = r
and state r belongs to group Pi , we record that the response of state q for the input t
is Pi , as shown in figure (a).
Clearly, no two states showing different responses can be equivalent, because
reading either a or b, one goes to an accepting state and the other goes to a non-
accepting state. So, we partition the groups P1 and P2 as shown in figure (b).

P1 P2

p1 p1 p1 p2 p1 p1
P11 P12 P21 P22
a      
{3, 4, 5} {0, 1, 2} {3} {4, 5} {0} {1, 2}
b      
p2 p1 p1 p2 p1 p1
(a) Response analysis (b) Partitioning

227
State Minimization Manipulating FA

Having P1 and P2 partitioned into the four new groups, again we need to make sure
that all the states in each group are equivalent by analyzing their responses to each
input. Suppose that states i and j in a group show different responses. Then, there
must be an input string (of length 2 or longer) for which i enters in an accepting state,
while j does not, or vice versa. They cannot be equivalent, and should be partitioned
into different groups. For the example, the response analysis below shows that all the
states in every group show the same responses. We stop the partitioning.
Let p and q be two states belonging to a group of the final partition. For an arbitrary
input string w {a, b}*, both (p, w) and (q, w) will enter a state which belongs to
the same group. Since all the states in a group are either accepting or non-accepting,
(p, w) enters an accepting states if and only if (q, w) does. They are equivalent and
hence, we can merge them into one state to reduce the number of states.
P11 P12 P21 P22
p12 p12 p11 p11
a    
{3} {4, 5} {0} {1, 2}
b    
p12 p12 p12 p12
228
Manipulating FA
State Minimization

Thus, for the example, merging the two states in {1, 2} and {4, 5}, we get the
minimized DFA shown in figure (b) below.

a {3} {4, 5} {0} {1, 2}


b
3
a a
1 a a
b
b 4 3
start 0
a a, b
2 b b 0 1,2 a a, b
b 5
b a start b
4,5
(a) DFA (b) Reduced DFA

229
State Minimization Manipulating FA

Example 2. The figure below shows the results of state partitioning. It shows no
two states are equivalent, i.e., the DFA is already minimized. For this simple DFA,
we can get the same conclusion by the following observation: (2, aaa) = 5 is
accepting, while (3, aaa) = 1 is not. So, states 2 and 3 are not equivalent. Likewise,
A
(3, aa) = 5 is accepting, while (4, aa) = 1 is not, and hence states 3 and 4 are not N
equivalent, and so on. I
a
P1 P21 P22
1 2 3 4 5 p21 p21 p22
a a a a a   
{1} {2, 3, 4} { 5 }
P1 P2
p2 p2 p2 p1
a    
P1 P211 P212 P22
{1} {2, 3, 4, 5 }
p211 p212
a  
{ 1 } {2} {3} { 4 } { 5 } {1} {2, 3} {4} {5}

230
Manipulating FA
Rumination (1): State Minimization
• Here we present another algorithm for minimizing a DFA. This approach iteratively identifies state pairs which are not
equivalent. For the same previous example (repeated in figure (a) below), the matrix in figure (b) shows the initial contents (the x
marks) for all accepting and non-accepting state pairs, which are not equivalent. The matrix has an x mark at entry [i, j], if states i
and j are not equivalent. Blank entries are to be resolved.

a 1
b 2
3
a a
1 a i 3 x x x
b 4 4 x x x
start 0
a 5 x x x
2 b b
b 0 1 2 3 4
5
b a j
(a) DFA (b) Matrix of state pairs not equivalent

231
Manipulating FA
Rumination (1): State Minimization

To resolve the entry [4, 3], we examine the two states (4, b) = 5 and (3, b) = 1. Since [5,1] is marked with x, the entry [4, 3] is
also marked with x. Notice that since (4, a) = 4 and (3, a) = 3 and entry [4, 3] is unresolved, we cannot use this information.
Likewise, we find the pairs [5, 3], [1, 0], and [2, 0] are not equivalent. Now, we examine [2,1]. Since (2, b) = 5, (1, b) = 4, and
entry [5, 4] is not resolved yet, we defer the resolution till [5, 4] is resolved. Since (4, a) = 4 and (5, a) = 5, and (4, b) = 5 and (5,
b) = 4, we mark entry [5, 4] with E, implying that states 4 and 5 are equivalent. Since (1, a) = (2, a) = 3, (1, b) = 4, (2, b) = 5,
and [5, 4] is marked with E, entry [2,1] is also marked with E.

a 1 x
b 2 x E A
3
a a N
1 a i 3 x x x
I
4 x x x x
0 b 4
start a 5 x x x x E
2 b b
b 0 1 2 3 4
5
b a j
(a) DFA (b) Matrix of state pairs not equivalent

232
Rumination (1): State Minimization Manipulating FA

• We have studied two methods for minimizing the number of states of a DFA, but no proof is given. We need to prove that after
the minimization, no further state reduction is possible. For the proof, let M be a DFA (figure (a)) that we get by applying one of the
state minimization methods. Suppose that there is a DFA M’(figure (b)), which recognizes the same language with smaller number
of states than M.

w2
w2 q

p s
start w1 start w1

(a) M (b) M’

Since M’ is smaller than M, and they recognize the same language, there must be two input strings w 1 and w2 such that DFA M’,
reading both w1 and w2, enter a state s, while DFA M enters in two different states, p and q. Because they recognize the same
language, state s is an accepting state if and only if both p and q are accepting. Because states p and q are distinct in M, which is
constructed through the method merging equivalent states, they must not be equivalent states.

233
Rumination (1): State Minimization Manipulating FA

Since states p and q in M are not equivalent, there will be a string x such that reading x in state p, M enters an accepting state,
while, reading the same string x in state q, the machine enters a non-accepting state (see figure (a) below).
Now we examine what happens in M’ for the same input strings w1x and w2x. Since M’ recognizes the same language, reading
w1x it should enter an accepting state, while reading w2x it enters a non-accepting state. This implies that in state s reading x, M’
takes two different transitions as shown in figure (b). This implies that M’ is an NFA, which contradicts our assumption that M’ is a
DFA.
A
N
x w2 I
w2 q

p x
s
start w1 start
x w1 x

(a) M (b) M'

234
Rumination (1): State Minimization Manipulating FA

The state minimization methods that we have studied are applicable only to DFA’s. For example, figure (a) shows an NFA where
no two states are equivalent. However, figure (b) shows a smaller FA which recognizes the same language.

a
a

a
a
a
a

(a) (b)

May You Have


Break Time
May you have enough happiness to make you sweet,
enough trials to make you strong,
enough sorrow to keep you human,
enough hope to make you happy and
Enough money to buy gift!
- Anonymous -

235
Manipulating FA
Exercises
8.1 Using the method presented in this chapter, convert each of the following NFA’s to a DFA. You should also show the
procedure that you took for your answer.

a, b a, b
a a a  a a
1 2 3 4 1 2 3 4
start start

(a) (b)
8.2 Using the state partitioning technique, minimize the following DFA. You should also show how you partitioned the
states to get your answer.

0
1
2
0 1 4
0

1
1
0
start
3
1

236
Manipulating FA
Exercises

8.3 Can we reduce the number of states of the following DFA without affecting the language recognized by it? Justify
your answer.

1 2 3 4 5
a a a a

8.4 Let M be an NFA, and M’ be the DFA that we get by the technique presented in Section 8.1 for converting an NFA to a
DFA. Prove that L(M) = L(M’). (For the proof, show that for an arbitrary input string w, M accepts w if and only if M’ does.)

237
Models of Language Generation:
Grammars

238
9. Properties of the Formal Languages
Every language has structural properties. For example, every string in the language {anbn | n > 0 } has a’s followed by b’s,
and their numbers are same. Since, in general, it is impractical to define a language in terms of set properties, we use
grammars. However, it is very difficult problem to figure out the structure of the language generated by a grammar G. In fact,
it has been shown that for a given grammar G, the decision problem of whether L(G) = * or not is unsolvable, except for
regular and deterministic context-free grammars, where  is the set of terminal symbols of G. Given two grammars G1 and
G2, it is also proven that the problem of deciding whether or not L(G1) = L(G2) is unsolvable, except for the cases where both
grammars are regular or deterministic context-free grammars. In fact, the problem remains open for the grammars whose
languages can be recognized by a DPDA.

We classify formal grammars (and their languages) into type 0 (phrase structured), type 1 (context-sensitive), type 2
(context-free), and type 3 (regular) according to the format of the rules. An interesting question is; Are there any properties
that the class of languages commonly exhibit? In this chapter, we answer this question for regular and context-free languages
by examining what happens under the various set operations.

9.1 Properties of regular languages 240


Union, Concatenation, Kleene star,
Reverse, Complement, Intersection, Set subtraction
9.2 Properties of context-free languages 245
Union, Concatenation, Kleene star,
Reverse, Complement
Rumination 248
Exercises 251

239
Properties of Languages

9.1 Properties of regular languages


Theorem 9.1 Let L1 and L2 be regular languages. The result of each of the
following set operations is also regular, i.e., regular languages are closed under each
of the following set operations.
(1) L1  L2 (union)
(2) L1L2 (product)
(3) L1* (Kleene star)
(4) L1R (reversal)
(5) L1 (complement)
(6) L1  L2 (intersection)
(7) L1 - L2 (set subtraction)

240
Properties of regular languages Properties of Languages

Proof. Let G1 and G2 be regular grammars, respectively, generating the languages L1


and L2, and let M1 and M2 be DFA, respectively, recognizing L1 and L2. (Recall that
every regular language is recognized by a DFA.) Let r1 and r2 be regular expressions,
respectively, denoting L1 and L2. (Recall that every language denoted by a regular
expression is regular.)

(1) (union). The regular expression r1+ r2 denotes L1  L2. Since every language
expressible by a regular expression is regular, L1  L2 is regular.

(2) (product). The regular expression r1r2 denotes L1L2. Hence, L1L2 is regular.

(3) (Kleene star). The regular expression (r1)* denotes L1*. Hence, L1* is regular.

241
Properties of regular languages Properties of Languages

(4) (reverse). Using DFA M1, which recognizes L1, we can construct an FA that
recognizes L1R. (The following example shows the procedure.) This implies that
L1R is also regular.
A
Add a new accepting state, and N
link an -transition from every
a accepting state to this new I
a a c c e p t i n g s t a t e . ( T h i s FA 
a recognizes the same language.)
b a
start a 
a
b a
b start
b
start a

b
a  b
a
a Let the state added be the start state, the
b old start state be the only accepting state,
a and reverse all the edges. This NFA
recognizes the reverse of the language
b r e c o g n i z e d b y t h e o r i g i n a l FA .
b
242
Properties of regular languages Properties of Languages

(5) (complement). Using DFA M1, which recognizes L1, we can construct a DFA
that recognizes the complement of L1, as the following example shows. (Recall that
for convenience, we do not show the dead state and the transitions entering to it.)

a Show the dead state and all a,b


a the transitions entering it.
a b
b a a a
start
a
b b
b start
a,b b
b b
A
a a a Change the accepting states
N
a to non-accepting states, and
b I
start non-accepting states to
accepting states.
b
b
243
Properties of regular languages Properties of Languages

(6) (intersection). By the equality shown below and the properties (1) and (5)
(i.e., the closure under union and complementation), L1  L2 is also regular.

L1  L2 = L1  L2 = L1  L2
(7) (set subtraction). Since L1 - L2 = L1  L2 , by the closure properties (5) (closure
under complementation) and (6) (closure under intersection), the language L1 - L2 is
also regular.

Pressure Break Time

Pressure is part of life: adapt to it and use it! I don’t do without pressure.
A physical therapist said, “Get in tune with your body.”
But if I listened to my body, I’d stay in bed all morning.”
- Stan Pottinger -

244
Properties of Languages

9.2 Properties of Context-free Languages


The properties of context-free languages are a little different from those of regular
languages. As the following theorem shows, they are not closed under intersection
and complementation.
Theorem 9.2 Context-free languages are closed under union, product, Kleene star,
and reversal. They are not closed under intersection and complementation.
Proof: For the proof, we will use context-free grammars exclusively. (We may use
PDA’s, but in general the proofs become somewhat unwieldy.) Let L1 be L2 two
arbitrary context-free languages that are, respectively, generated by context-free
grammars G1 = (N1, T1, P1, S1 ) and G2 = (N2, T2 , P2, S2 ).

Assume that N1  N2 = , i.e., the two grammars use different nonterminal


symbols. Otherwise, we can simply modify one of the grammars and let them meet
this condition without changing the language. Now, we prove the theorem.

(1) (union). We construct a context-free grammar G by merging all the rules from
G1 and G2, and then add a new start symbol S and the rules S  S1 | S2. Grammar
G generates L1  L2 .

245
Properties of CFL Properties of Languages

(2) (product). We construct a context-free grammar G by merging all the rules of G1


and G2, and then adding new start symbol S and the rule SS1S2. This grammar
generates the language L1L2 .

(3) (Kleene star). We modify G1 by introducing a new start symbol S and adding the
rule S  S1S | . This modified G1 generates the language (L1)*.

(4) (reversal). To make a context-free grammar that generates (L1)R , we reverse the
right side of every rule in G1.
Now we prove that context-fee languages are not closed under intersection and
complementation. For this proof, it is enough to show a counter example.
(5) (intersection). We can show that languages L1 and L2 shown below are context-fee
by either constructing a PDA or a CFG. However, language L3 is not context-free,
which we will prove in Chapter 12.
L1 = {a ib ic j  i, j  0 }, L2 = {a k b nc n  k, n  0 }
L3 = L1  L2 = {a i b i c i  i  0 }

246
Properties of CFL Properties of Languages

(6) (complement). To the contrary, suppose that the context-free languages are
closed under complementation. Then both L1 and L2 are context-free. We will use
the following property;
L 1  L 2 = L 1  L 2 = L1  L 2
By property (1), the union L1  L2 is also context-free. It follows that L1  L2 =
L1  L2 is context-free. This contradicts to the proven fact that context-free
languages are not closed under intersection.

Break Time
Enjoy Sharing
A young man saw an elderly couple sitting down to lunch at McDonald's. He noticed that they had ordered one meal,
and an extra drink cup. As he watched, the gentleman artfully divided the hamburger in half, then counted out the fries,
one for him, one for her, until each had half of them. Then he poured half of the soft drink into the extra cup and set that
in front of his wife. The old man then began to eat, and his wife sat watching, with her hands folded in her lap.
The young man decided to ask if they would allow him to purchase another meal for them so that they didn't have to
split theirs. The old gentleman said," Oh no. We've been married 50 years, and everything has always been and will
always be shared, 50/50." The young man then asked the wife if she was going to eat, and she replied, "It's his turn with
the teeth.“
- Fred & Wanda -

247
Properties of Languages
Rumination(1) : language properties
• Property (4) of regular languages says that if L is regular, so is LR. Let G be a regular grammar whose language is L. As
the following example shows, if we reverse the right side of every rule of G, then the resulting grammar G’ generates the
reverse of L(G), which is also regular according to the closure property of regular languages under reversal.
G: S  abA | ac A  bS | a
G’: S  Aba | ca A  Sb | a

Recall that every rule of a regular grammar has the restricted form of either A  xB or A  y, where A, B  VN and
x, y  (VT )*, where VN and VT are, respectively, the sets of nonterminals and terminals. In other words, at the right side
of every rule, at most one nonterminal symbol can appear, positioned at the right end, if any. We call such grammars right-
linear. In contrast, the left-linear grammars are the ones whose rules have the restricted form of either A  Bx or A  y.
Both right-linear grammars and left-linear grammars generate the same class of regular languages. We can prove this by
showing that (1) every language generated by a left-linear grammar is regular, and (2) every regular language can be
generated by a left-linear grammar as follows.
Proof (1). Let G be a left-linear grammar. Construct a right-linear grammar G’ by reversing the right side of every rule
of G. Grammar G’ generates the reverse of L(G), i.e., L(G) = (L(G’))R. Since G’ is a right-linear, L(G’) is regular. Since
the reverse of every regular language is also regular, L(G) is regular.
Proof (2). We argue with the same logic. Let L be a regular language. By the closure property under reversal, LR is
regular. Let G be a right-linear grammar which generates LR . Construct a left-linear grammar G’ by reversing the right
side of every rule of G. Grammar G’ generates L.

248
Rumination (1): language properties Properties of Languages

• We used DFA to prove that the complement of a regular language is regular. The proof does not work with NFA. Here
we show an example. The language recognized by the NFA in figure (a) is {a, aa}, whose member cannot be in its
complement. However, the NFA in figure (c) accepts both a and aa.

a a a a
Add the dead state
start a
start
a a
(a) a
a
a a (b)
Convert the states
start a of accepting and
a non-accepting
a
(c) a

• For the proof of the closure properties of regular languages, we used various models that characterize regular languages.
To prove the closure property under union operation, we used regular expressions. We may as well use regular grammars or
FA. However, for the efficiency of the proof, we need to choose a right model. Otherwise, the proof often appears difficult or
sometimes impossible. For example, try to prove the closure property under complementation with regular grammars or
regular expressions. There is no guideline available for the choice, except for building up related knowledge and
experiences.

249
Rumination (1): language properties Properties of Languages

• We saw the difference between the closure properties of regular languages and context-free languages. Another contrasting
property is that the class of languages recognized by NPDA properly contains the class of languages recognized by DPDA,
i.e., there are context-free languages that cannot be recognized by any DPDA. For example, it is impossible for a DPDA to
recognize the context-free language {wwR | w  {a, b}+ }. We know how to construct an NPDA recognizing this language.
(Recall that a language recognized by an NFA if and only if it is recognized by a DFA.) Appendix D shows that, like the
class of regular languages, the class of deterministic context-free languages (i.e., the languages recognized by DPDA) is
closed under complementation.

Communication Deadlock Break Time

“Who is calling?” was the answer to the telephone.


“Watt.”
“What is your name, please?”
“Watt’s my name.”
“That’s what I asked you. What’s your name?”
“That’s what I told you. Watt’s my name.”
A long pause, and then from Watt, “Is this James Brown?”
“No, this is Knott.”
“Please, tell me your name.”
“Will Knott.”
Whereupon they both hung up. - Raman -

250
Properties of Languages
Exercises
9.1 Let L be the language generated by the following regular grammar G. Construct a regular grammar whose language is
LR. You should also show the procedure that you took to get your answer.

G : S  aA | bB A  bD | 
B  aC | bE C  aC | bE | 
D  aE | bD |  E  aE | bD

9.2 (a) Construct a regular grammar (i.e., right-linear) whose language is the complement of language L generated by the
grammar G above. You should also show the procedure that you took to get your answer.
(b) Construct a left-linear grammar that generates the same language generated by the grammar G above.

9.3 For two context-free grammars G1 and G2 below, (a) construct a context-free grammar whose language is the
product L(G1)L(G2). (b) Construct a context-free grammar whose language is the union, L(G1)  L(G2). You should also
show the procedure that you took to get your answer.

G1 : S  aA | bB A  bD | 
B  aC | bE C  aC | bE |  G2: S  aA A  Sb | b
D  aE | bD |  E  aE | bD

251
Models of Language Generation:
Grammars

252
10. Manipulating Context-free Grammars
Throughout the last three chapters we studied the characterization among the three models of regular grammars, FA and
regular expressions, which belong to the bottom level of the Chomsky hierarchy, and how to manipulate them. In this chapter,
we move one level up the hierarchy and study how to manipulate CFG’s and PDA. Since PDA’s have a stack, it is much harder
to manipulate them than FA.

The nondeterminism in FA does not make any difference in the class of languages recognized by them, i.e., if a language is
recognizable by an NFA if and only if it is recognizable by a DFA. Actually, there is an effective method to transform an NFA
to a DFA which recognizes the same language. In contrast, there are context-free languages that can only be recognized by an
NDPDA. There is no algorithm available for converting an NPDA to a DPDA or for minimizing the number of states of a
DPDA. Our study will mainly focus on CFG’s till Chapter 13, where we will come back to a variation of DPDA and a subclass
of CFG’s for an application.

10.1 Reducing -production rules 254


Algorithm and example
10.2 Eliminating unit production rules 262
Algorithm and example
10.3 Eliminating useless symbols 266
Algorithm and example
10.4 Normal forms of CFG’s 277
Chomsky normal form (CNF)
Greibach normal form (GNF)
Exercises 285

253
Manipulating CFG
10.1 Minimizing the number of -production rules
Using -production rules may help designing a succinct grammar. However, like -
transitions in an FA, it could be computationally intensive to automatically analyze
the language generated by a grammar which has many -production rules. In this
section we will learn how to minimize the number of -production rules in a CFG.
To understand the basic idea of minimizing the number of -production rules, let’s
see how we can reduce the number of the -production rules in the CFG G shown
below, while preserving the language of the grammar.
Eliminating A   and B  , we need to expand the rule S  AB as shown in
G so that G and G generate the same language. In G we keep the rule S  AB for
the case when both A and B, respectively, produce a and b. In addition to this rule we
need the rule S  A for the case when A and B, respectively, produce a and .
Likewise, we need the rule S  B for the case when A and B, respectively, produce 
and b. Finally, we need S  , for the case of A   and B  . We claim that L(G)
= L(G).

G: S  AB Aa|  Bb| 
G: S  AB | A | B |  A a Bb
254
Minimizing -production rules Manipulating CFG

In the previous example since the language contains , the grammar G’


needs one -production rule S  . If the language does not contain , we can
eliminate -production rules completely, as the following example shows.

G: S  ABc A a |  Bb| 
G: S  ABc | Ac | Bc | c Aa Ba

We apply the same idea when a nonterminal generates  indirectly, as C and D do in


the CFG G below. (Notice that C and D can derive , respectively, using A and B.)

G: S  CD CA| c DB| d Aa |  Bb| 

G: S  CD | C | D |  CA| c DB| d Aa Bb

255
Minimizing -production rules Manipulating CFG

Now, for the proof of the following theorem we present an algorithm which given a
CFG G, minimizes the number of -production rules of G using the idea presented
above.

Theorem 10.1 Given a CFG G, we can construct a CFG G' that satisfies the
following conditions.
• L(G) = L(G').
• If   L(G), G' has no -production rule. Otherwise, S   is the only -production
rule in G'.

Proof (Algorithm): Let G = (VT, VN, P, S) be the CFG. The following algorithm
constructs a CFG G' = (VT , VN , P', S), which satisfies the conditions.

256
Minimizing -production rules Manipulating CFG

Algorithm (minimizing -production rules)


Input: a CFG G = (VT, VN, P, S)
Output: a CFG G = (VT , VN , P', S), G satisfies the conditions of the theorem
(1) With the following procedure find all nonterminal symbols in VN, which directly
(by line 1) or indirectly (by lines 2 - 5) generate , to construct the set W.

1. W = {A | A  VN , A   };
2. Do {
3. W = W;
4. W = W' {A | A  VN , A  ,   (W) +};
5. } while ( W != W );

257
Minimizing -production rules Manipulating CFG

Algorithm (continued)
(2) Eliminate all -production rules from the rule set P, and let the resulting set be P1.
(3) With P1, construct the rule set P' of G as follows: For each rule A   (A  VN ),
construct the following set R.
R = {' | ' ( ) is a string constructed with  by deleting zero or more symbols
from  that belong to the set W constructed in step (1) of the algorithm.}
For every '  R add the rule A  ' to the set P'. (Notice that   R, i.e., A   is
excluded in this step, and every rule in P1 is included in P'.)
(4) If the start symbol S is in W, put the rule S   in P'. (Notice that if S  W, then 
is in the language L(G), and hence, G' needs the rule S  .)

Now, we will show two examples of minimizing the number -production rules
using this algorithm.

258
Minimizing -production rules Manipulating CFG

Example 1. Minimize the number of -production rules of the following CFG G.

G: S  CD CA| c DB| d Aa |  Bb| 

Applying steps (1) - (4) of the algorithm, we get the following.


(1) : W = { A, B }; W = W; // W = { A, B }
W = W  { C, D }; W = W; // W = { A, B, C, D }
W = W  { S }; W = W; // W = { A, B, C, D, S }
W = W  { }; // W = { A, B, C, D, S }
(2) : P1 = { S  CD C A| c DB| d Aa Bb }
(3), (4):

G: S  CD | C | D |  CA| c DB| d Aa Bb

259
Minimizing -production rules Manipulating CFG

Example 2. Minimize the number of -production rules from the following CFG G.

G: S ADC | EFg A aA |  D  FGH | b C c | 


Ea Ff| G  Gg | H Hh|

(1) W = {A, C, F, H}; W = W;


W = W  {G}; W = W; // W = {A, C, F, G, H};
W = W  {D}; W = W; // W = {A, C, D, F, G, H};
W = W  {S}; W = W; // W = {A, C, D, F, G, H, S};
W = W  { } ; W = W; // W = {A, C, D, F, G, H, S};

260
Minimizing -production rules Manipulating CFG

G: S ADC | EFg A aA |  D  FGH | b C c | 


Ea Ff| G  Gg | H Hh|

(1) : W = {A, C, D, F, G, H, S}
(2) :
P1 = { S ADC | EFg A aA D  FGH | b C c
Ea Ff G  Gg | H Hh }

(3), (4) :

G : S  ADC | AD | AC | DC | A | D | C | EFg | Eg | 
A aA | a D  FGH | FG | FH | GH | F | G | H | b
C c Ea Ff G  Gg | g | H Hh

261
Manipulating CFG
10.2 Eliminating unit production rules
A unit production rule is a rule of the form C  A, which generates a single
nonterminal symbol. Unit productions rules in a CFG can be eliminated without
affecting the language. For example, minimizing the number of -production rules in
Example 1 in the previous section, we got the following grammar.

G: S  CD | C | D |  CA| c DB| d Aa Bb

Rules C  A and A  a can be reduced to a single rule C  a without affecting the


language generated by the grammar. Hence, we can safely change the rules C  A | c
to C  a | c. For the same reason, we can substitute S  a | c for the rules S  C and
C  a | c to eliminate the unit production rule S  C.
Finally, applying the same idea to the unit production rules S  C | D, we get the
following grammar, which has no unit production rules. (Notice that A  a and B  b
are removed because they are useless.)

G1: S  CD | a | c | b | d |  Ca|c Db|d

262
Eliminating unit production rules Manipulating CFG

Now, for the proof of the following theorem, we present an algorithm which given
an arbitrary CFG G, systematically applies the idea that we have presented above
and eliminates all unit production rules to construct a CFG G, without affecting the
language.
Theorem 10.2 Let G be an arbitrary CFG. We can construct a CFG G that generates
L(G) with no unit production rules.

Proof (Algorithm): Let G = (VT, VN, P, S). The following algorithm constructs a
CFG G = (VT , VN , P, S), which generates L(G) with no unit production rules.

Break Time
Today’s Quote

The simple solution for disappointment depression: Get up and get moving.
Physically move. Do. Act. Get going.
- Peter McWilliams -

263
Eliminating Unit production rules Manipulating CFG

Algorithm (eliminate unit production rules)


Input: a CFG G = (VT, VN, P, S) // G has no -production rules except S  .
Output: CFG G = (VT , VN , P, S) // G has no unit production rules
// and L(G) = L(G).
// Let A, B  VN and   ( VT  VN )* .
while (there is a unit production rule in P) {
find a unit production rule A  B in P such that
B has no unit production rules;
for every B   which is not unit production,
change A  B to A  ;
} //end while
P = P ; G = (VT , VN , P, S);

264
Eliminating Unit production rules Manipulating CFG

This algorithm, going bottom-up, iteratively changes unit production rules to non-
unit production rules until it sees no unit production rule. The following example would
be enough for understanding how it works. (Notice that by definition the rules
generating a terminal symbol (e.g., A  a ) are not unit production rules.)

G : S  CD | C | D |  CA| c DB| d
Aa B  bB | b EF FE

1. CA| c  C a | c DB|d  D  bB | b | d
2. S  CD | C | D |   S  CD | a | c | bB | b | d | 

G' : S  CD | a | c | bB | b | d |  Ca| c D  bB | b | d
A a B  bB | b EF FE

Notice that the algorithm fail to eliminate cyclic unit production rules. For example,
see the rules involved with E and F in G' above. Together with A, these symbols are
useless for generating the language. We will shortly presents an algorithm eliminating
useless symbols in a CFG.
265
Manipulating CFG
10.3 Eliminating Useless Symbols from a CFG
Manipulating a CFG may induce useless symbols (and their rules), as we saw in the
previous section. The following grammar has two typical examples of useless
symbols, B, C, D, and E.

G: S AE | A | B A aA |  BC


CB D  Db | b E Ec

Symbol D is useless because no sentential forms (i.e., strings consisting of terminal


and nonterminal symbols) containing D are derivable starting with S. Although it is
possible to derive a sentential form containing E starting with S, this symbol is also
useless because no terminal strings can be generated starting with it. Symbols B and C
are involved in a cyclic unit production rules, they are useless. Therefore B, C, D, and
E and the rules involving these nonterminal symbols can be safely eliminated to get
the following simple CFG G.

G: S  A A  aA | 

266
Eliminating useless symbols Manipulating CFG

In the previous example we observed the following; in a CFG G = (VT, VN, P, S), a
nonterminal symbol X VN is useful if it satisfies the two conditions (1) and (2)
below. (Recall that by  *  we denote the derivation of , starting with , in zero or
more steps of applying a rule at a time.)

(1) X * w, where w VT*


(2) S * X, where ,  (VT  VN)*
These two conditions are illustrated in the figure below, i.e., a nonterminal symbol X
is useful, if (1) a terminal string w (or ) is derivable starting with X, and (2) a
sentential form including X is derivable starting with S,.

 
S * X a sentential form
(2)

* w
(1)

267
Eliminating useless symbols Manipulating CFG

Now we present an algorithm for eliminating useless symbols in a CFG. The


algorithm is presented in two parts by the following two lemmas. Lemma 1 and
Lemma 2 present an algorithm for eliminating all symbols that do not satisfy the
conditions (1) and (3), respectively.

Lemma 1. Given a CFG G = (VT , VN , P, S), it is possible to construct a CFG


G = (VT ,VN , P, S) that satisfies the following conditions.
(i) L(G) = L(G)
(ii) For every X  VN , X * w, where w  (VT)*

Proof (algorithm 1). Let A be a nonterminal symbol. We compute VN and P of


grammar G with the following algorithm. (Notice that this algorithm uses a similar
technique to one which we used for minimizing the number of -production rules.)

268
Eliminating useless symbols Manipulating CFG

Algorithm for constructing VN and P

// Collect all nonterminal symbols directly generating a terminal string.


W = {A | A  w  P, w  (VT)* };

// Collect all nonterminal symbols which can derive a terminal string.


do {
W = W;
W = W {A | A    P and   ( VT  W )*};
}while ( W != W )

VN = W;
P = {A   | A    P and  (VN  VT)*};

269
Eliminating useless symbols Manipulating CFG

Lemma 2. Given a CFG G = (VT , VN , P, S), it is possible to construct a CFG G =


(VT ,VN , P, S) that satisfies the following conditions.
(1) L(G) = L(G)
(2) For every symbol X  VT  VN , it is possible to
derive a sentential form starting with S, i.e.,
S * X, where ,   (VT  VN)*
Proof (algorithm 2). Starting with empty sets VT and VN , the algorithm
progressively builds them up as follows. Put the start symbol S in VN. Suppose that
A is a nonterminal symbol that has just been put in VN. Then for every rule A  
and every symbol X that appears in , if X is a nonterminal symbol, put it in VN.
Otherwise, if it is a terminal symbol, put it in VT. The algorithm repeats this process
till no more nonterminal symbols added into VN.
Then the algorithm constructs P by collecting all the rules in P that uses only the
symbols in VT  VN.

270
Eliminating useless symbols Manipulating CFG

Algorithm (Lemma 2)
Input: G = (VT , VN , P, S)
Output: G = (VT ,VN , P, S) // G satisfies the two conditions of the lemma.

Let VT = VN = V =  ; V = V  { S }; // initialize

do { // Iteratively build up VT and VN


VN = V;
V = V  { B | A  VN , A    P, and  contains B  VN };
VT = VT  { a | A  VN , A    P, and  contains a  VT };
} while (VN != V )

P = {A   | A    P, A  VN ,   (VT  VN)* };


G = (VT ,VN , P, S) ;

271
Eliminating useless symbols Manipulating CFG

Theorem 10.3 Given a CFG G = (VT , VN , P, S), it is possible to construct a CFG


G = (VT , VN , P, S) that satisfies the following conditions.
(1) L(G) = L(G)
(2) For every X  VN , X * (VT)*
(3) For every X  VT  VN , S * X , where ,   (VN  VT)*

Proof. We construct such CFG G using the two algorithms presented in Lemmas 1
and 2, in that order.

Break Time
Today’s Quote

He who receives a benefit should never forget it;


he who bestow should never remember it.
- Pierre Charron -

272
Eliminating useless symbols Manipulating CFG

Example 1. Eliminate all useless symbols in the following CFG.

G: S AD | EFg A aGD D  FGd C cCEc


E  Ee F  Ff |  G  Gg | g H  hH | h

Step 1: Apply the algorithm of Lemma 1:


W = {F, G, H} ;
W´ = W ; W = W´  {D} // W = {D, F, G, H};
W´ = W ; W = W´  {A} // W = {A, D, F, G, H};
W´ = W ; W = W´  {S} // W = {A, D, F, G, H, S};
W´ = W ; W = W´  { } // W = {A, D, F, G, H, S};
V´N = W // V´N = {A, D, F, G, H, S}

G1: S AD A aGD D  FGd F  Ff | f


G  Gg | g H  hH | h

273
Eliminating useless symbols Manipulating CFG

G1: S AD A aGD D  FGd F  Ff | f


G  Gg | g H  hH | h

Step 2: Apply the algorithm of Lemma 2 to G1:


1. VT = VN = V = { };
2. V = V  {S}; VN = V; // VN = {S}
3. V = V  {A, D}; VT = VT  { };
VN = V; // VN = {S, A, D}, VT = { }
4. V = V  {G, F} ; VT = VT  {a, d};
VN = V; // VN = {S, A, D, G, F}, VT = {a, d}
5. VN = VN  {} ; VT = VT  {g, f} = {a, d, g, f};
VN = V; // VN = {S, A, D, G, F}, VT = {a, d, g, f}

274
Eliminating useless symbols Manipulating CFG

G1: S AD A aGD D  FGd F  Ff | f


G  Gg | g H  hH | h

Apply the algorithm of Lemma 2

VN = {S, A, D, G, F}, VT = {a, d, g, f}

Select the rules that contain only the


symbols in VN  VT

Final result:
G: S AD A aGD D  FGd F  Ff | f
G  Gg | g

275
Eliminating useless symbols Manipulating CFG

To eliminate useless symbols, we applied the algorithm of Lemma 1 followed by


the algorithm of Lemma 2. If we apply the algorithm of Lemma 2 first, we may fail
to eliminate all useless symbols. The following figure shows an example.

G: S  AB | a Aa
Apply the algorithm Apply the algorithm
of Lemma 1 of Lemma 2

G1: S  a A a G1: S  AB | a A a

Apply the algorithm Apply the algorithm


of Lemma 2 of Lemma 1

G: S  a G: S  a Aa

276
Manipulating CFG
10.4 Normal Form of CFG
We know that CFG rules have no restriction on their right side. There can be any
string (of finite length) of terminal and/or nonterminal symbols. This “free form” of
CFG rules makes it more complex to solve certain problems concerning context-free
languages. It is often desirable to have the rules simplified on their form without
affecting the language of the grammar. In this section we will study two of such
forms, called Chomsky normal form and Greibach normal form, presented below.

Let G = (VN, VT, P, S) be a CFG and let A, B, CVN, a VT and   (VN)*.
• If every rule of G has the form A  BC or A  a, then G is called a
Chomsky normal form (CNF) grammar.
• If every rule of G has the form A  a, then G is called a
Greibach normal form (GNF) grammar.

Note that the definitions are only concerned with CFG’s whose language does not
contain .

277
Normal Form of CFG Manipulating CFG

Converting a CFG to CNF


Given a CFG, we will show how to convert the grammar to CNF. The conversion is
done for each rule. The following example would be enough to understand the idea.

• Converting A  aBCDbE to a set of rules in CNF.


A  A1B1 A1  a // and let B1 derive BCDbE as follows,
B1  BC1 // and let C1 derive CDbE as follows,
C1  CD1 // and let D1 derive DbE as follows,
D1  DE1 // and let E1 derive bE,
E1  F1E F1  b // and finally let F1 derive b.
* Subscripted symbols are nonterminal symbols newly introduced.

Starting with A, the above set of CNF rules applied in the order, we can derive
aBCDbE, which can be derived by applying the single rule A  aBCDbE.

278
Normal Form of CFG Manipulating CFG

Example 1. Convert the following CFG to CNF.

(1) S  AaBCb (2) A  abb (3) B  aC (4) C  aCb | ac

For each rule, we apply the idea that we


have presented above.

(1) S  AA1 A1  A2A3 A2  a A3  BA4 A4  CA5 A5  b


(2) A  B1B2 B1 a B2  B3B4 B3  b B4  b
(3) B  C1C C1  a
(4) C  D1D2 | E1E2 D1  a D2  CD3 D3  b E1  a E2  c

We leave it for the reader to prove that the conversion does not affect the
language of the grammar.

279
Normal Form of CFG Manipulating CFG

Converting a CFG to GNF

GNF: A  a , A  VN , a VT,   (VN)*

The algorithm for converting a CFG to GNF is not simple, not to mention
the proof of its correctness. We shall only present an example to show the
basic idea.
We need some terminology introduced before the algorithm. We call a rule front-
recursion, if its right side starts with the same nonterminal symbol as the one in the
left side (e.g., A  AB). A front-nonterminal is a nonterminal symbol that is located
at the left end of the right side of the rule, and a non-front terminal is a terminal
symbol that is not located at the left end of the rule. The algorithm works in the
following three phases.
1. Eliminate front-recursion
2. Eliminate front–nonterminal
3. Eliminate non-front terminal

280
Normal Form of CFG Manipulating CFG

GNF: A  a , A  VN , a VT,   (VN)*

The algorithm repeatedly applies phases 1 and 2 until it sees neither front-
recursions nor front-nonterminals. Then it applies Phase 3 to eliminate non-front
terminals in the rules to transform them finally into GNF.
Phase 1. Consider the following rules with a front-recursion. We introduce a new
nonterminal (N in the example) and eliminate the front-recursion as follows.

A  AB | CD A  CDN | CD N  BN | B

Notice that a front-recursion rule must accompany a non-front-recursion, for


example A  CD in the above example. Otherwise, the rule with the front-recursive is
useless and can be eliminated. In general, let A   be a non-front-recursion rule.
Then the front-recursion A  A can be eliminated as follows.

A  A |  A  N |  N  N | 

281
Normal Form of CFG Manipulating CFG

Phase 2. Find a rule with a front-nonterminal (B in the example below) such that
every rule of B has a front-terminal. Convert the rule with a front-nonterminal into a
rule having a front-terminal as follows.

A  BbC | aA A  cDAbC | aEbC | aA

B  cDA | aE B  cDA | aE

.... ....

Phase 3. Convert every non-front terminal to a new nonterminal as follows.

A  cDAbC A  cDANC Nb

We can show that the above three phases of conversion do not affect the language
of CFG G. (We omit the proof, which is challenging.)

282
Normal Form of CFG Manipulating CFG

Example: Converting a CFG to GNF

CFG: S  SA | BeA S  BeAC | BeA C  AC | A


A  AS | a Phase 1 A  aD | a D  SD | S

Bb Bb
Phase 2

S  beAC | beA C  aDC | aC | aD | a


A  aD | a D  SD | S
Bb

Phase 2

S  beAC | beA C  aDC | aC | aD | a


A  aD | a D  beACD | beAD | beAC | beA
Bb

283
Normal Form of CFG Manipulating CFG

The result of phase 2 iteration:

S  beAC | beA C  aDC | aC | aD | a


A  aD | a D  beACD | beAD | beAC | beA
Bb

Phase 3

S  bEAC | bEA C  aDC | aC | aD | a


A  aD | a D  bEACD | bEAD | bEAC | bEA
Bb Ee

Rule B  b, which is useless, can be eliminated using the algorithm presented in


Section 10.3.

284
Manipulating CFG
Exercises
10.1 With the CFG G given below follow the steps (i) – (iii) below to construct a CFG such that L(G) = L(G’) with the smallest
possible number of -production rules .
(i) Find the set of nonterminal symbols that derive . You should also show the procedure that you took for you answer.
(ii) Eliminate all -production rules from the grammar.
(iii) Find CFG G’ using the result from step (ii) above.

G: S ABCd | BEF A aA |  B FGH | b C cC | 


Ea| Ff| G Gg | H H h | 

10.2 Using the algorithm presented in Section 10.2, eliminate all unit production rules from the following CFG G. You should
also show how you got your answer for the grammar.

G: S  AB | A | B |  A C | c BD| d

D  bD | B | b Ca

285
Exercises Manipulating CFG

10.3 This problem is designed to eliminate all useless symbols from the CFG G given below in two steps (a) and (b). You should
show the results from each step.
(a) Using the algorithm presented in Lemma 1 in Section 10.3, construct a CFG G1 that satisfies the following conditions.
(i) L(G) = L(G1)
(ii) Every nonterminal symbol of G1 derives a terminal string.
(b) Apply the algorithm of Lemma 2 in Section 10.3 and construct a CFG G G2 that satisfies the following conditions.
(i) L(G) = L(G2),
(ii) For every terminal or nonterminal symbol X of G2, it is possible to derive a sentential form containing X.

G: S BA | CDg B aEA A  DEd C  Ce D  Df | 


E  Eg | g F  hF | h G cGCc I  cDEFab

10.4 Transform the following CFG into Chomsky normal form.


S  abABCde A  aAa B  ba C  cab

286

You might also like