11.3 Constructing Efficient Finite Automata
Start
Final
pe sure to draw a picture of this minimum-state DFA. 4
EXAMPLE [xAMPLE 4. We'll compute the minimum-state DFA for the DFA given by
the following transition table:
Start
Final
Final
The set of states is S = {0, 1, 2, 3, 4, 5}. For Step 1 we g the sore
sequence of relations:624 Regular Languages and Finite Automata
compute T'pin({1, 2, 3}, b) and Tyin({4, 5}, 2) a8 follows:
Trin({Ly 2, 3}, 6) = Tpin(CL, b) = (71, 6)] = (11 = 1, 2, 8},
Twin({4, 5}, @) = Tmin(l4], a) = (74, a] = (41 = {4, 5}.
Similar computations will yield the table for T'min, Which we'll leave as an)
exercise. 4
Exercises
1. Use (11.7) to construct an NFA for each of the following regular expressions.
a.a*b* b. (a+b). c. a* + b*. ‘ ‘
2. Construct a DFA table for the following NFA in two different ways using
(11.8):
a. Take unions of lambda closures of the NFA entries.
b. Take lambda closures of unions of the NFA entries.11.3 Constructing Efficient Finite Automata
a. a*b*, b. (a + 5)*. c. a* + b*,
6, Let the set of states for a DFA be S = {0,1, 2, 3,4, 5}, where the start state
is 0 and the final states are 2 and 5. Let the equivalence relation on S for
a minimum-state DFA be generated by the following set of equivalent
pairs of states:
{{0,1}, {0,4}, {1,4}, {2,5}.
Write down the states of the minimum-state DFA.
. Finish Example 4 by calculating the transition table for the new mini-
mum-state DFA.
Given the following DFA table, use algorithm (11.10) to compute the
minimum-state DFA. Answer parts (a), (b), and (c) as you proceed through
the algorithm.
a
>
a. Write down the set of equivalent pairs.
b. Write down the states of the minimum-state DFA.
c. Write down the transition table for the minimum-state D)
9. For each of the following DFAs, use (11.10) to find the mini626 Regular Languages and Finite Automata
11. Suppose we're given the following NFA table:
a b A 7
, “meee {2}
Start 0 {1} {1} (2)
1 eee i)
Final 2 @ fo} A}
Find a simple regular expression for the regular language recognized by
this NFA. Hint: Transform the NFA into a DFA, and then find the
minimum-state DFA.
12. What can you say about the regular language accepted by a DFA in which
all states are final?
11.4 Regular Language Topics
We've already seen characterizations of regular languages by regular express-
ions, languages accepted by DF As, and languages accepted by NFAs. In this
section we'll introduce still another characterization of regular languages in
terms of certain restricted grammars. We'll discuss some properties of regular
languages that can be used to find languages that are not regular.
Regular Grammars
A regular language can be described by a special kind of grammar in ¥
the productions take a certain form. A grammar is called a regular
if each production takes one of the following two forms, where a
letters are nonterminals and w is a string of terminals or A:
‘Seu
EST11.4 Regular Language Topics
Regular Expression Regular Grammar
ao S> Alas
(a + b)* S+AlaS|bS
a* + b* SA|A|B
A-alaA
B+b\|bB
a*b SblaS
ba* SbA
AAlaA
(ab)* SAlabS
The last three examples in the preceding list involve products of lan-
guages. Most problems occur in trying to construct a regular grammar for a
language that is the product of languages. Let’s look at an example to see
whether we can get some insight into constructing such grammars.
Suppose we want to construct a regular grammar for the language of the
regular expression a*bc*. First we observe that the strings of a*bc* start with
either the letter a or the letter b. We can represent this property by writing
down the following two productions, where S is the start symbol:
S—aS|bC.
These productions allow us to derive strings of the form 6C,
so on. Now all we need is a definition for C to derive the628 Regular Languages and Finite Automata
S>AlaS|B
B-b|bB.
Let’s look at four sublanguages of {a"b"|m, n EN} at oe defined a
each string contains occurrences of a or b. The fol eae each
language together with a regular expression and a regular grammar.
Language Expression Regular Grammar
{a"b"|m>0 and n>0} a*bb* S>aS|B
B= b|bB.
{a"b"|m>0 andn>0} aa*b* S+aA
A-aA|B
BA\bB.
{a"b"|m>0 and n>0} —aa*bb* SsaA
A-—aA|B
B+ b|bB.
{a"b"|m > 0 or n> 0} aa*b* +a*bb* S—+aA|bB
A> AlaA|B *
BA\bB. 4 7
Any regular language has a regular grammar; conversely, any
grammar generates a regular language. To see this, we'll give two algori
one to transform an NFA to a regular grammar and the other to trans
regular grammar to an NFA, where the language accepted by the }
identical to the language generated by the regular grammar.
NFA to Regular Grammar
Perform the following steps to construct a regular g
ates the language of a given NFA: mi
1, Rename the states to a set of capital letters,
2. The start symbol is the NFA’s start state,
J labeled
3. For each transition from J to11.4 Regular Language Topics 629
acceptance path in the NFA for some string corresponds to a derivation by the
grammar for the same string. Let’s do an example.
EXAMPLE 2, Let's see how (11.11) transforms the following NFA into a
regular grammar:
Start
The algorithm takes this NFA and constructs the following regular grammar
with start symbol S:
Salld
I>bK
J>adJ|aK
K-A.
edges labeled A, a, a, respectively. The grammar derives this string with
For example, to accept the string aa, the NFA follows the path S, J, J, Kwith
following sequence of productions: ~—
ag 5
__ Now let’s look at the c630 Regular Languages and Finite Automata
Let’s look at an algorithm to transform a regular grammar into
Regular Grammar to NFA
Perform the following steps to construct an NFA that accepts the la
of a given regular grammar:
1. If necessary, transform the grammar so that all productions have the
form A +x or A+ «B, where x is either a ae letter or A.
2. The start state of the NFA is the grammar’s start symbol.
3. For each production Jad, construct a state transition from 7 to.
labeled with the letter a. j
4. For each production JJ, construct a state transition from I to J
labeled with A.
5. If there are productions of the form J >a for some letter a, then cr
a single new state symbol F. For each production Ia, consti
state transition from J to F labeled with a.
6. The final states of the NFA are F together with all I for which
a production J > A.
End of Algorithm
It’s easy to see that the language of the NFA is the same as
of the given regular grammar because the productions used in
of any string correspond exactly with the state transitions o . t
acceptance for the string. Here’s an example. § v11.4 Regular Language Topics 631 5
Properties of Regular Languages
guag
We need to face the fact that not all languages are regular. To see this, let’s
Jook at a classic example. Suppose we want to find a DFA or NFA to recognize
the following language:
{a"b"|n > 0}.
After a few attempts at trying to find a DFA or an NFA or a regular expression
or a regular grammar, we might get the idea that it can’t be done. But how
can we be sure that a language is not regular? We can try to prove it. A proof
usually proceeds by assuming that the language is regular and then trying to
find a contradiction of some kind. For example, we might be able to find some
property of regular languages that the given language doesn’t satisfy. So let’s
look at a few properties of regular languages.
A useful property of regular languages comes from the observation that
any DFA for an infinite regular language must contain a cycle to recognize
infinitely many strings. For example, suppose we have a DFA with n states
that recognizes an infinite language. Since the language is infinite, we can
pick a string with more than n letters. To accept the string, the DFA must
enter some state twice (by the pigeonhole principle). So any DFA for an
infinite regular language must contain a subgraph that is a cycle. We can
symbolize the situation as follows, where each arrow represents a path that
may contain other states of the DFA and x, y, and z are the strings of letters
along each path:632 Regular Languages and Finite Automata
This grammar accepts all strings of the form xy*z for all k > 0. For exam
the string xy°z can be derived as follows:
aN = xyN = xyyN = xyyyN = xyyyz.
The property that we've been discussing is called the pumping property
because the string y can be pumped up to y* by traveling through the same
loop k times. Our discussion serves as an informal proof of the following
pumping lemma:
Pumping Lemma (Regular Languages) (11.13)
Let L be an infinite regular language over the alphabet A. Then there
exist strings x, y,z¢A*, where y # A, such that xy'zeL for all k > 0.
If an infinite language does not satisfy the conclusion of (11.13), then it
can’t be regular. So we can sometimes use this fact to prove that a language
is not regular. Here’s an example.
EXAMPLE 4. Let’s show that the language L = {a"b"|n > 0} is not regular.
We'll assume, by way of contradiction, that L is regular. Then we can use the
pumping lemma (11.13) to assert the existence of strings x, y, z (y # A) such
that xy'zeL for all k > 0. Let k = 1. Then there is some n such that
xyz =a"b",
Since y # A, there are three possibilities for y:
1. y consists of all a’
2. y consists of all b’s.
a in TOR re
3. y consists of one or more a’s followed by
one or more11.4 Regular Language Topics 633
the language is regular. The reason is that palindromes satisfy the conclusion
of (11.13). For example, over the alphabet {a,b}, suppose we let x = a, y = aba,
and z = a. Then certainly xy"z is a palindrome for all n 2 0.
There is another version of the pumping lemma for infinite regular
languages. It can sometimes be used when (11.13) does not do the job, because
it specifies some additional properties of infinite regular languages. We'll state
this second pumping lemma as follows:
Second Pumping Lemma (Regular Languages) (1.14)
Let L be an infinite regular language over the alphabet A, and suppose
that L can be accepted by a DFA with m states. Then for every string seL
such that |s| > m there exist strings x, y, z¢A*, where y # A, such that
s = xyz and |xy| 0.
The proof of (11.14) is a refinement of the discussion preceding (11.13),
and we'll leave it as an exercise. Let’s see how (11.14) can be used to prove
that the language consisting of all palindromes over an alphabet with two or
more letters is not regular.
EXAMPLE 5. Suppose P is the language of palindromes over the alphabet of
two letters a and b. To prove that P is not regular, we'll assume that P is
regular and try for a contradiction. If P is regular, then there is a DFA with
m states to recognize P. Let’s choose a palindrome of the following form:
saa" tbat},
UW s *
zh ‘ aT. nbhini 130
Now we can apply (11.14) to assert the existence of strings x, yz such
* ‘ 6 ah \
5
that
oo
7634 Regular Languages and Finite Automata
nd closure to form
combined b language product, a
d by union, language p' ns ne with twOetaa
languages. We'll list these three properties
Properties of Regular Languages
1. The union of two regular languages is regular.
2. The language product of two regular languages is regular.
3. The closure of a regular language is regular.
4. The complement of a regular language is regular.
5. The intersection of two regular languages is regular.
We'll prove statement 4 and leave statement 5 as an exercise. If D is a
DFA for the regular language L, then construct a new DFA, say D’, from D
by making all the final states nonfinal and by making all the nonfinal states
final. If we let A be the alphabet for L, it follows that D’ recognizes the
complement A* — L. Thus the complement of L is regular. QED.
18
EXAMPLE 6. Suppose L is the language over alphabet {a, b} consisting of all
strings with an equal number of a’s and 6’s. For example, the strings abba,
ab, and babbaa are all in L. Is L a regular language? To see that the answer
is no, consider the following argument: Bar.
Let M be the language of the regular expression a*b*. It certainly
that L nM = {a"b"|n > 0}. Now we're in a position to use (11.15). Supy
the contrary that L is regular. We know that M is regular because
language of the regular expression a*b*. Therefore L 7M must be reg
(11.15). In other words, {a"b"|n > 0} must be regular. But we know th
{a"b"|n > 0} is NOT regular. Therefore our assumption that L is regular
to a contradiction. Thus L is not regular, < aa
We'll finish by listing two interesting
Properties of regular la
can also be used to determine nonregulari a -
larity.11.4 Regular Language Topics
of the form S > f(w) and S > f(w)T. The new grammar is regular, and any
string in f(L) is derived by this new grammar. QED.
EXAMPLE 7. Let's use (11.16) to show that the language L = {a"bo"|neN} is
not regular. We can define a morphism f:{a, 6, c}* > {a, b, c}* by f(a) =a,
f(b) = A, and f(c) = 6. Then f(L) = {a"b"|n > 0}. IfL is regular, then we must
also conclude by (11.16) that f(L) is regular. But we know that f(L) is not
regular. Therefore L is not regular.
There are some very simple nonregular languages. For example, we know
that the language {a"b"|n > 0} is not regular. Therefore finite automata are
not powerful enough to recognize it. Therefore finite automata can’t recognize
some simple programming constructs, such as nested begin-end pairs, nested
open and closed parentheses, brackets, and braces. We'll see in the next
chapter that there are more powerful machines to recognize such constructs.
Exercises
1. Find a regular grammar for each of the following regular expressions.
aa+b. b. a+be. c. a+b*
d. ab* +c. e. ab* + be*. f. a*bc* + ac.
g. (aa + bb)*. h. (aa + bb)(aa + bb)*. i, (ab)*c(a + b)*.
Find a regular grammar to describe each of the following languages.
a. {a, b, c}.
isan, abjac}s
c. {a, b, ab, ba, abb, baa,..., ab", ba",.
d. {a, aaa, aaaaa,...,a7"**,
. {A, a, abb, abbbb,..., ab",...}.
. {A, a, b,c, aa, bb, ce,.:., a", b", c",...}.
. {A, a, b, ca, be, cca, bec,..., c"a, be”,...}.
fake U(™ REN.
_ i, {a"be"|m, ne F om
‘Find a regulanite Automata
636 Regular Languages and Fin
*abb represel
5. It’s easy to see that the regular expression (a + b)*abl pI :
language recognized by the following NFA
Start
Use (11.11) to find a grammar for the language represented by the
NFA. ;
6. Use (11.12) to construct an NFA to recognize the language of the following
regular grammar:
SallbJ
I> b1|A
JoaJ\A. ‘ a
7. The language {a"b"|n > 0) is not regular. If it is regular, the pumping
lemma asserts the existence of strings x, y, z (y # A) such that xy'z is in
the language for all k > 0. Let k = 1. Then there is some n such t
xyz =a"b".
Show that it is impossible for y to have the form: one |
followed by one or more b’s.
. Show that each of the following language:
_{a’ba"|neN}. —b. {a?|p isa pri
Pr hat the intersection of two11.4 Regular Language Topics 637
c. Define a morphism g: {a, b, c}* > {a, b, c}* such that
g({a"be"|neN}) = {a"b"|nEN}.
d. Argue that {a"ba"|neN} is not regular by using parts (a), (b), and
(c) together with (11.15) and (11.16).
Chapter Summary
The chapter introduced several formulations for regular languages. Regular
expressions are algebraic representations of regular languages. Finite auto-
mata—DFAs and NFAs—are machines that recognize regular languages.
Regular grammars derive regular languages. The most important point is that
there are algorithms to transform any one of these formulations into any other
one. In other words, each formulation is equivalent to any of the other
formulations. There is also an algorithm to transform any DFA into a
minimum-state DFA. Therefore we can start with a regular language as
either a regular expression or a regular grammar and automatically trans-
form it into a minimum-state DFA to recognize the regular language.
We also observed some other things. Finite automata can be used as
output devices—Mealy and Moore machines. There are some very simple
interpreters for DFAs and FAs. There are some basic properties of regular
languages given by the pumping lemmas, set operations, and m« ‘ism
Many simple languages are not regular. For example, the nonregularity of the
language {a"b"|n > 0} tells us that finite automata can’
simple ‘constructs such as nested begin-en