Professional Documents
Culture Documents
1
7. Hierarchy of the Models: Characterization
We have studied four classes of formal languages and their grammars, four classes of automata and their variations, and other
models for representing or generating languages such as regular expressions, syntax flow graphs and L-systems. In this chapter we
will study the relations, called the Chomsky hierarchy, between these models. The Chomsky hierarchy reveals two important
relationships, characterizations and containments, among the four classes of languages and automata.
The characterizations show the relations between the models for the language generation (i.e., grammars) and those for language
recognition (i.e., automata) or expression. For example, a language L can be generated by a regular grammar if and only if it is
recognizable by an FA, and a language L is recognizable by an FA if and only if it is expressible by a regular expression.
The containments show the set relations among the classes of languages generated (recognized) by the four types of grammars
(respectively, automata). For example, the class of languages generated (recognized) by context-free grammars (respectively, by
PDA) properly contains the class of languages generated (recognized) by regular grammars (respectively, FA). The same set
relation holds among the classes of languages generated (recognized) by type 0, 1, and 2 grammars (respectively, TM, LBA, and
PDA).
In terms of computational capability, this containment relation between the four classes of languages recognized by TM, LBA,
PDA, and FA implies that anything computable by an LBA is also computable by TM, but not necessarily the other way around,
and anything computable by a PDA is also computable by LBA, and so on. Therefore the Chomsky hierarchy provides valuable
information for designing an efficient computational model, as well as for analyzing the computational capability of a given
model. This chapter proves the characterization of the models at the lowest level of the hierarchy, i.e., regular grammars, FA, and
regular expressions. Laying the groundwork through the following chapters, we will complete the proof of the hierarchy in
Chapters 12 and 15.
2
Hierarchy
You’re alive. Do something. The directive in life, the moral imperative was so uncomplicated. It could be expressed
in single words, not complete sentences. It sounded like this: Look. Choose. Act.
- Barbara Hall -
3
Hierarchy
7. 1 Chomsky Hierarchy
The Chomsky hierarchy shows the characterization relations between two
different types of models as illustrated below. For example, L is a language
generated by type 0 grammar if and only if it is recognized by a TM. In this chapter
we will only prove the characterization relations between regular grammars, FA’s
and regular expressions. We defer the proofs for other characterizations till Chapter
15. These proofs are challenging, and usually included in a graduate course.
.
FA Regular
Regular exp
4
Hierarchy
Chomsky Hierarchy
For i ∈ {0, 1, 2, 3}, let TYPE-iL be the class of languages generated by type i
grammars. The Chomsky hierarchy shows the proper containment relation among the
four classes of languages as illustrated by the Venn diagram below. The class of
regular languages is properly contained in the class of context-free languages. The
class of context-free languages is properly contained in the class of context-sensitive
languages which is in turn contained in the class of type 0 (phrase-structured)
languages. In Chapter 12, we will see an elegant proof of these containment relations.
Type-0L Type-3L
Type-1L Type-2L
5
Chomsky Hierarchy Hierarchy
TM
LBA
PDA
FA
The figure in the following page shows the summary of these relations, called the
Chomsky hierarchy (named after Noam Chomsky, who defined the four classes of
languages), among the models that we have studied, as well as some interesting
models investigated by researchers. This hierarchy is a beautiful piece of knowledge
that computer scientists have gained through the advancement of the field.
6
Hierarchy
Chomsky Hierarchy
Languages (grammars) Automata Other Models
Context-sensitive Linear-bounded .
Languages(type 1) Automata .
.
7
Hierarchy
7.2 Proof of the Characterization
The theorem below states the characterization relations between two models at the
lowest level of the Chomsky hierarchy, which is illustrated by the following figure.
We will prove this theorem.
Regular Regular
languages FA expression
8
Hierarchy
Proof (1.1): RG ⇒ FA
Let G be a regular grammar. For the proof, we will present how to construct an
FA to recognize L(G). Suppose that A and B are two arbitrary nonterminal symbols,
and a and b are terminal symbols. The following figure shows how to transform
typical production rules that will appear in a regular grammar into the state
transitions of an FA which recognizes the language generated by the grammar.
(Notice that heavy circle denote an accepting state.)
Rules State transitions
a b
A → abB | B A B
ε
Let the state with label A be an
A →ε A accepting state.
Proof (1.2): FA ⇒ RG
Given an arbitrary FA, we transform the FA into a regular grammar as follows.
First, label the start state with the designated symbol S and others state with an
arbitrary (distinguishable) nonterminal symbol. Then transform each transition into
a rule as shown below.
a
b, c A → bB | cB | aA
A B
S
start Let S be the start symbol.
A A →ε
11
Proof (1.2): FA ⇒ RG Hierarchy
a c a c
a a A C E
a a
b a b a
S
start b start b B A
a D
a N
b Label the states
b
I
Transform each transition
into a rule
G: S → aS | aA A → bB B → bB | bS | aD | ε
D → aC C → aB | cE E →ε
To complete the proof, we need to show that for every string x, the FA accepts x
if and only if the grammar G generates it. (The detailed proof is left for the reader.)
12
Hierarchy
Proof (2.1): Re ⇒ FA
Let R be a regular expression which expresses a regular language L. We construct
an FA which recognizes L. Let Σ be the alphabet of L. Going along the inductive
definition of regular expressions, we will show how to construct an FA recognizes
the language expressed by R.
(1) If R is a regular expression which is either φ , ε , or a, for a symbol a ∈ Σ ,
which,
respectively, express the language φ (the empty set), {ε }, or {a}, we construct an
FA which recognizes the respective language as follows:
φ ε a
a
start start start
(2) Suppose that we have constructed two FA M1 and M2 , which recognize the
languages expressed by regular expressions r1 and r2, respectively:
L(M1) = L( r1) L(M2) = L(r2)
13
Hierarchy
Proof (2.1): Re ⇒ FA
(3) Using M1 and M2 , we construct FA M1+2 , M12 , and M1*, which, respectively,
recognize the language expressed by the regular expressions r1+ r2, r1r2, and (r1)*as
follows.
To construct the FA M1+2 , introduce a new start state and link it to the start state
of M1 and M2 , as the following figure illustrates. Clearly, L(M1+2 ) = L( r1+ r2).
14
Hierarchy
Proof (2.1): Re ⇒ FA
L(M12 ) = L(r1r2)
(b) M12
A car was involved in an accident. As one might expect, a large crowd gathered. A newspaper reporter, anxious to
get is story, pushed and struggled to get near the car. Being a clever sort, he started shouting loudly, “Let me
through! Let me through please! I am the son of the victim.” The crowd made the way for him. Lying in front of the
car was a donkey.
- Anonymous -
15
Hierarchy
Proof (2.1): Re ⇒ FA
ε
L(M1) = L( r1)
new start
ε start
M1
start
16
Hierarchy
Proof (2.1): Re ⇒ FA
Notice that when we construct M1* , if we use the old start state as illustrated in
figure (a) below without introducing the new one, FA M1* may accept a string not in
r1*. To see why, consider an FA whose start state is in a cycle as shown in figure (b)
below. Since string ab is not accepted by M1, it should not be in r1*. However, the FA
in figure (c) shows that the FA accepts ab.
ε
b b ε
start
M1 a a
b a b a
start start start
(a) (b) (c)
17
Proof(2.1): Re ⇒ FA Hierarchy
Example: Constructing an FA for a given regular expression.
Based on the approach given above, we show a step-wise construction of an FA
which recognizes the language expressed by the regular expression ((ab + ε )ba)*.
a b ab
a b ε a ε b
start start start start
ab + ε a ε b ba b ε a A
ε start N
start I
ε a ε b ε b ε a
ε
(ab + ε )ba start ε
ε
((ab + ε
a ε b ε b ε a
ε )ba)*
ε
start ε ε
ε
18
Hierarchy
Proof (2.2): FA ⇒ Re
Let M be an FA. We will show a method systematically transforming the state
transition graph of M to a regular expression which expresses L(M).
We first transform all the edge labels (i.e., the input symbols) in the transition
graph into a regular expression (see figure (b) below). Now, we interpret the
transition graph as follows: If there is an edge from a state p to a state q labeled
with a regular expression r, then it implies that M, in state p reading any string in
L(r), enters state q.
By extending the function δ , we let it denote the above observation by δ (p, r)
= q. Clearly, labeling the edges with regular expressions this way does not affect
the language accepted by M.
a, b a+b
a a, b, c a a+b+c
2 2
start 1 5 start 1 5
b 4 a b 4 a
3 3
ε ε
(a) (b)
19
Proof (2.2): FA ⇒ Re Hierarchy
Now, let G be the state transition graph with edges labeled with a regular
expression. We eliminate any state (except for the start state and the accepting states)
from G and manipulate the edges and their labels without affecting the language
recognize by the automaton. The following example shows how.
a+b
Eliminate state 2
a(a+b)*(a+b+c)
a a+b+c
2 1 5
1 5
b a b 4 a A
3 4 3
a Eliminate a N
states 3 and 4 I
Merge edges a(a+b)*(a+b+c)
a(a+b) (a+b+c)+baa
*
1 5 1 5
baa
Clearly, the label a(a+b)*(a+b+c)+baa on the edge from the start state to
accepting state 5 is a regular expression which denotes the set of strings accepted by
state 5 of the automaton.
20
Proof (2.2): FA ⇒ Re Hierarchy
The following figures show a typical case of eliminating a state from a state
transition graph and manipulating the edge labels with a regular expression.
Clearly, the same idea works even when a complex regular expression is
substituted for each simple regular expression in the graph.
f df*c
af*b
a c af*c
... r s ...
q ... r s ...
b d
df*b
If the state transition graph has k (≥ 1) accepting states, the language L(M) is the
union of all the languages accepted by these accepting states. Suppose that M has k
accepting states and for each i, we found a regular expression ri that denotes the
language accepted by i-th accepting state, then the regular expression r denoting the
language accepted by M is given as follows.
r = r1 + r2 + . . . . + rk
21
Proof (2.2): FA ⇒ Re Hierarchy
r4
a 0 4
b
a 2
b 1 (b)
a ε
0 b ε r = r 0 + r4
start
b 4 b
a r0
3 0
b
(a)
(c)
22
Proof (2.2): FA ⇒ Re Hierarchy
To compute r4, we change state 0, the start state, to non-accepting state, and
leaving the start state and the accepting state 4, eliminating all the other states
one by one.
We begin by eliminating state 2. The order of elimination does not matter.
Although the regular expression may differ, it expresses the same language. It is
more convenient to eliminate a state that involves fewer transition edges.
A
a Eliminate state 2 a ba N
b I
a 2 a b
1 b 1
b
a ε
b ε ba
0 b ε start 0
start
4 b b 4 b
b
a a
3 3
b b
23
Proof (2.2): FA ⇒ Re Hierarchy
Parallel transition edges (i.e., edges having the same origin and destination) are
merged into one edge with all the labels merged into one using the operator +. In
the figures below, notice how the two transitions from state 1 to state 4 and the two
loop transitions on state 1 are, respectively, merged into one.
a+ba
a ba Merge parallel edges
a a b+ε
1 b b 1
b
ba b ba
0 b ε start
0
start
b 4 b 4 b
b
a a
3 3
b b
24
Proof (2.2): FA ⇒ Re Hierarchy
a b+ε a
b 1 1 b+ε
b
b ba bba
start
0 0 b
start
b 4 b 4 b
a ba
3
b
a+ba bb
a Merge parallel
1 b+ε
b edges
A
b bba
start
0 b+bb N
4 I
ba
25
Proof (2.2): FA ⇒ Re Hierarchy
a+ba
a A
1 b+ε N
b
bba I
0 b b+bb
start Eliminate state 1
4
ba
a(a+ba)*b a(a+ba)*(b+ε )
b+bb
b
bba(a+ba)*b
0 4
start
ba
bba(a+ba)*(b+ε )
26
Proof (2.2): FA ⇒ Re Hierarchy
a(a+ba)*b a(a+ba)*(b+ε )
b+bb A
b N
bba(a+ba)*b
start
0 4 I
a(ba+a)*b+b a(a+ba)*(b+ε )
0 4
start
bba(a+ba)*b+ba
bba(ba+a)*(b+ε )+
(b+bb)
27
Proof (2.2): FA ⇒ Re Hierarchy
In general, with all the states eliminated except for the start state and one accepting
state, we get a transition graph, as shown in figure (i) below. We need one more
step.
r00 r04
a(ba+a)*b+b a(a+ba)*(b+ε )
r4 = (r00 )*r04 ( r44 + r40 (r00 )*r04 )*
0 4
start
bba(a+ba)*b+ba
r40
bba(ba+a)*(b+ε )+
(b+bb) r44
29
Proof (2.2): FA ⇒ Re Hierarchy
0 4 0 4
start bba(a+ba)*b+ba start bba(a+ba)*b+ba
bba(ba+a)*(b+ε )+ bba(ba+a)*(b+ε )+
(b+bb) (b+bb)
30
Proof (2.2): FA ⇒ Re Hierarchy
start 0
0 4 A
start bba(a+ba)*b+ba (i) N
r40 I
(h) bba(ba+a)*(b+ε )+
r00 + r04 (r44 )*r40
(b+bb) r44
Finally, substituting back the entire short notation in r0 + r4, we will get a regular
expression r which denotes the language recognized by the given FA.
31
Hierarchy
Rumination (1): FA ⇒ Re
• The state elimination technique would be easy to practice for a simple graph with a paper-and-pencil. However, it will be messy
for a large graph. There is a beautiful algorithmic approach, called CYK algorithm, developed based on the dynamic programming.
This algorithm is presented in Appendix C.
• Depending on the order of the states eliminated in the procedure, we will get a different regular expression. However, they are
equivalent, i.e., they denote the same language recognized by the FA.
• It is an intractable (i.e., solvable, but no practical algorithm available) problem to tell whether two arbitrary regular expressions
are equivalent or not. However, for some particular cases, especially if the expressions are simple, we can solve the problem using
the techniques available in this text. The figures below prove the two equivalences (1) and (2) below. Notice that the same
equivalence holds even when an arbitrary regular expression is substituted for a or b in the expressions. (In the following chapter,
we will learn how to convert an NFA to a DFA and a method to minimize the number of states of a DFA.)
A
(1) a* = (a* )* (2) (a + b)* = (a*b*)*
a
N
a a
a I
(a* )* ε a a a
2 2 {1,2} a*
1 ε 1 a 1
Eliminate Convert to Regular
Convert Minimize the
ε - a DFA expression
to an FA number of states
btransitions a, b
a, b a, b
a
(a*b*)* ε 2 a,b (a + b)*
a, b 2 a, b
{1,2}
1 ε 1 a, b
1
32
Hierarchy
Exercises
7.1 Using the technique presented in Section 7.2, transform the following state transition graph of an FA into a
regular grammar which generates the same language recognized by the FA.
b
a
b
a,b ε
b
ε
start
ε
a,b
7.2 Using the technique presented in Section 7.2, transform the following regular grammar into the state transition graph of
an FA which recognizes the same language generated by the grammar.
S → abS | cC A → aS | B | a B → aA | ε C → aB | abc
7.3 Compute a regular expression that denotes the language recognized by the FA shown in problem 7.1 above. You should
also show the procedure that you took to get your answer.
7.4 Let L be the language denoted by the following regular expression. (a) Construct the state transition graph of an FA
that recognizes L, and (b) Construct a regular grammar that generates L. You should also show the procedure that you took
to find each answer.
((ba + b)* + (cd(a + b))*)bba
33