FLA Chapter 3

Regular languages
Definition: Let Σ be an alphabet. The class

R of regular languages over Σ is defined as
follows:
1. ∅ is an element of R.
2. {λ} is an element of R.
3. For each a ∈ Σ, {a} is an element of R.
4. If L1 and L2 are any elements of R, then

S
(a) L1 L2 is an element of R.
(b) L1L2 is an element of R.
(c) L1∗ is an element of R.
Only languages that can be obtained by using

statements 1–4 are regular languages.
1
Example:
Let Σ = {0, 1}. Some regular languages over

{0, 1} are:
1. ∅
2. {λ}
3. {0}, {1}
4.(a) {0} {1} = {0, 1}

S
(b) {0} {1} = {01}, {1} {0} = {10}
(c) {0}∗, {1}∗
2
Further examples:
1. {0} {10} = {0, 10}

S
2. {0} {01} = {001}
3. ({1} {λ}) {001}

S
4. {110}∗ {0, 1}
5. {10, 111, 11010}∗
∗ ∗S
6. {0, 10} {11} {001, λ}
3
A regular language can therefore be described
by an explicit formula. By convention, we may
simplify the formula slightly: We
• leave out {} or replace with (), and
• replace
S
with +,
to obtain a regular expression.
Take care not to confuse the two notations—

either write {0, 1}∗, or (0 + 1)∗, but not
{0 + 1}∗.
4
Example:
Some regular expressions are:
1. ∅
2. λ
3. 0, 1
4.(a) 0 + 1
(b) 01, 10
(c) 0∗, 1∗
5
Further examples:
1. 0 + 10
2. 001
3. (1 + λ) 001
4. 110∗ (0 + 1)
5. (10 + 111 + 11010)∗
6. (0 + 10)∗ (11∗ + (001 + λ))
6
Regular expressions
Definition: Let Σ be an alphabet. The class

RE of regular expressions over Σ is defined as
follows:
1. ∅ is an element of RE.
2. λ is an element of RE.
3. For each a ∈ Σ, a is an element of RE.
4. If r1 and r2 are any elements of RE, then
(a) r1 + r2 is an element of RE.
(b) r1r2 is an element of RE.
(c) r1∗ is an element of RE.
Only expressions that can be obtained by using

statements 1–4 are regular expressions.
7
Simplifications

1. Exponential notation: write (rr) as r2

2. ‘Plus’ notation: write ((r∗) r) as r+
If we form regular expressions strictly accord-

ing to the definition, they are fully parenthe-
sized. In order to simplify such expressions, we
establish an order of precedence:
1. Highest: ∗
2. Then: concatenation
3. Lowest: +
8
Convention
if r1 and r2 are regular expressions, then r1 =

r2 means that they correspond to the same
languages.
Thus
b c = a + b∗ c
∗
a+
However,
(a + b)∗ 6= a + b∗
9
Some examples of simplification:
1∗ (1 + λ) = 1∗
1∗1∗ = 1∗
0∗ + 1∗ = 1∗ + 0∗
∗ ∗ ∗
0 1 = (0 + 1 )∗
(0 + 1)∗01(0 + 1)∗ + 1∗0∗ = (0 + 1)∗
We won’t study the algebra of regular expres-

sions. Think of these statements as state-
ments about the languages that the regular
expressions represent.
10
Consider
(0 + 1)∗01(0 + 1)∗ + 1∗0∗ = (0 + 1)∗
∗
∗
[
(0 + 1) ≈ {0} {1}
represents the language of all strings of 0’s and
1’s.
∗ 01 0 + 1 ∗ ≈
(0 + 1 ) ( )
[ ∗ [ ∗
{0} {1} {0} {1} {0} {1}
1’s having the substring 01.
1∗0∗ ≈ {1}∗{0}∗
1’s where the 1’s precede the 0’s, i.e., that do
not have the substring 01.
11
Example: The language L ⊆ {0, 1}∗ of all
strings of even length:
Any string in L can be divided into a number

of strings of even length.
Any concatenation of strings of even length

produces a string in L.
Therefore we can write

L = {00, 01, 10, 11}∗
which corresponds to the regular expression
(00 + 01 + 10 + 11)∗
We can also write

L = {{0, 1} {0, 1}}∗
which corresponds to
((0 + 1) (0 + 1))∗
12
Example: The language L ⊆ {0, 1}∗ containing
an odd number of 1’s:
Any string in L must contain at least one 1,

therefore we start with a string of the form
0i10j
There is an even number of further 1’s, each

followed by zero or more 0’s. This we can see
as a repetition of strings of the form
(10m10n)
Therefore we can represent L by

∗ ∗ ∗ ∗ ∗
0 10 10 10
13
If we decide to stop the initial substring imme-
diately after the 1, we can represent L as:
∗ ∗ ∗ ∗ ∗
0 1 0 10 1 0
We can also concentrate on the last 1 in the

string:
∗ ∗ ∗ ∗ ∗
0 10 1 0 10
or on a 1 in the middle, and make sure it has

an even number of 1’s on either side of it:
∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
0 10 10 1 0 10 1 0
14
There may be many ways to describe a typical
element of a language, depending on which as-
pect we want to emphasize. There isn’t always
one that is the simplest or most natural.
The regular expression must be general enough

to describe every string in the language, but
not so general that it also describes strings
not in the language.
Note: In this course, don’t spend a lot of time

trying to simplify a regular expression if it isn’t
necessary for further work.
15
∗ ∗ ∗ ∗
10 10 10
is not general enough, since it doesn’t include

strings beginning with 0.
Adding 0∗ at the beginning to obtain
0∗ 10∗10∗ ∗10∗

rectifies that.
16
Let L ⊆ {0, 1}∗ be the set of all strings of
length 6 or less.
A regular expression for L is
λ + 0 + 1 + 00 + 01+
10 + 11 + 000 + . . . + 111110 + 111111
which isn’t very elegant.
Let’s first describe the set of strings of length

exactly 6:
(0 + 1 ) ( 0 + 1 ) ( 0 + 1 ) ( 0 + 1 ) ( 0 + 1 ) ( 0 + 1 )
or
(0 + 1 )6
To include strings of length less than 6, we

allow each of the factors to be λ, i.e.,
(0 + 1 + λ)6
17
Let
L=
x ∈ {0, 1}∗ |

x ends with 1 and

does not contain the substring 00}
We want a regular expression for L.
A string does not contain the substring 00

⇐⇒ no 0 can be followed by 0
⇐⇒ every 0 either comes at the end, or is
followed by 1.
Every string in L ends with 1

⇒ copies of the strings 01 and 1 account for
the entire string
⇒ every string in L corresponds to the regular
expression (1 + 01)∗.
18
(1 + 01)∗ allows λ, which doesn’t end with 1.
Therefore, (1 + 01)∗ is too general.
Possible fix - add 1 at the end - (1 + 01)∗1
Not correct, since we now exclude 01.
Add that possibility: (1 + 01)∗ (1 + 01)
Therefore, final answer is (1 + 01)+ .
19
Let l (for ‘letter’) denote
a + b + ... + z + A + B + ... + Z
and d (for ‘digit’) denote
0 + 1 + ... + 9 .
An identifier in C is any string of length 1 or

more that contains only letters, digits, ’s and
begins with a letter or .
Written as regular expression:
(l + )(l + d + )∗
20
Let
• a denote plus,
• m denote minus,
• s (for ‘sign’) denote λ + a + m, and
• p denote a point.
21
A constant in Pascal:
sd+(pd+ + pd+Esd+ + Esd+)
• First a sign,
• then one or more digits,
• then
– either a point and one or more digits,

which may or may not be followed by an
E, a sign and one or more digits,
– or just the E, the sign and one or more

digits.
If the constant is in exponential format, then

no decimal point is needed. If there is a deci-
mal point, there must be at least one digit on
either side.
22
Memory required to recognize a language
We want to discuss the problem of recognizing

the strings in a language. Let’s first establish
two conventions:
• We’ll have a single pass from left to right.
– this restriction is somewhat arbitrary—

two-way FA accept only regular lan-
guages (proof in Hopcroft and Ullman)
– but simplifies the discussion of how

much info must be remembered during
the process
– and thus allows us to classify languages

on the basis of how much we must re-
member at each step to recognize them
23
• We won’t wait until we reach the end of a
given string to decide if the string is in the
language or not. At each step we make a
tentative decision on the basis of the string
of input symbols we’ve seen so far.
When we reach the end, the final decision

is simply the most recent tentative deci-
sion, which was made for the substring that
is actually the entire string.
24
How much must we remember about a prefix
in order to make a decision about it?
Two extremes:
• Everything, i.e., the entire substring
• Nothing, e.g.,
– language is empty: we can ignore input

and answer ‘no’ at each step,
– language is Σ∗: we can ignore input and

answer ‘yes’ at each step.
In both cases the answer we return is al-

ways the same, thus we don’t need to re-
member anything about the substring, or
distinguish one substring from another.
25
Between these extremes, the answer is not al-
ways the same. There are at least two strings
x and y for which the answers are different.
Thus the info we remember when we’ve re-

ceived input x must be different from what we
remember when we’ve received input y.
Thus in at least one of these two situations we

must remember something.
26
Let
L = x ∈ {0, 1}∗ | x ends with 0

The decision for any x 6= λ depends only on its

last symbol.
Therefore we needn’t distinguish
• between one string ending with 0 and any

other ending with 0, or

other ending with 1.
27
We haven’t considered λ yet.
Compare λ to any string ending in 1:
1. Neither is in L.
2. When you add one more symbol to each of

them, they both end in the same symbol.
The two resulting strings are either both in
or both not in L.
Therefore λ can be treated in exactly the same

way as any string ending in 1.
28
In summary, we have only two cases:
1. The input string ends with 0.
2. The input string doesn’t end with 0.
29
Let
L = x ∈ {0, 1}∗ | x ends with 10

From the definition it follows that we have to

distinguish between
• a string ending with 10, and
• a string not ending with 10.
Therefore we immediately have at least two

cases. Let’s call them 10 and N .
30
The decision for any x depends only on its last
two symbols.
Therefore we needn’t distinguish, say,


Can we take this one step further, and say that

it’s sufficient to remember whether the current
prefix ends with 10 or not? In other words, are
cases 10 and N the only ones?
31
Consider the string 01011: if you remember
only that it doesn’t end with 10, you cannot
distinguish it from 100, now or later.
Suppose the next input is 0.
• Case 01011: current prefix is 010110,

which is in L.
• Case 100: current prefix is 1000, which is

not in L.
Therefore you have to remember enough now

to distinguish between 01011 and 100.
32
In how many cases do we have to split case N ?
Consider two strings x and y, where x = 0 and

y is any string ending in 00:
• Neither is in L. Starting from either one

we need at least one more symbol to have
a string in L.
• Input 0: Both x0 and y0 end in 00, and are

not in L.
Input 1: Both x1 and y1 end in 01, and are

not in L.
Thus x and y represent cases that don’t need

to be distinguished. We merge them into one
case, called A.
33
Now consider the three strings x, y and z,
where x = 1, y ends in 01, and z ends in 11:
• Not one is in L. Starting from any one, we

need at least one more symbol to have a
string in L.
• Input 0: Each of x0, y0, and z0 ends in 10,

and is in L.
Input 1: Each of x1, y1, and z1 ends in 11,

and is not in L.
Thus these three strings represent three cases

that need not be distinguished at all. We
merge them into one case, called B.
34
Now consider the strings x and y, where x = λ
and y is any string in case A:
• Neither is in L.
• Input 0: Both x0 and y0 is a string repre-

sented by case A.
Input 1: Both x1 and y1 is a string repre-

sented by case B.
Thus x and y represent two cases that need

not be distinguished. We can thus include λ in
case A.
The total number of states is now three, and

this cannot be reduced any further.
35
Let
∗
L = x ∈ {0, 1} | second-last symbol of x is 0

distinguish between
• a string having second-last symbol 0, and
• a string having second-last symbol 1.

cases. Let’s call them 0a and 1a.
Are two cases sufficient?
36
Consider case 0a:
It includes both 01 and 00.
Suppose the next input is 0:
• 010 6∈ L, but
• 000 ∈ L
Thus, we need to split case 0a into two cases,

say with labels 01 and 00.
We could have chosen input 1 to distinguish

the two strings.
37
Consider case 1a:
It includes both 10 and 11.
Suppose the next input is 0:
• 100 ∈ L, but
• 110 6∈ L
Thus, we need to split case 1a into two cases,

say with labels 10 and 11.
We could have chosen input 1 to distinguish

the two strings.
38
Thus we have established that it is necessary to
remember both the last two symbols. There-
fore we already have four different cases.
What about the strings λ, 0, and 1? Does any

of them make a further case necessary?
Consider λ. Is it perhaps in case 00?
If we take input 1, then
• λ1 6∈ L, but
• 001 ∈ L
Therefore, it is necessary to distinguish be-

tween λ and 00.
Similarly, one could show that it is necessary

to distinguish λ from 01 and 10.
39
Finally, is λ in case 11?
For both, at least two more input symbols are

needed before the result can be in L.
By then, it will be irrelevant whether we started

off with λ or 11. Thus, λ is covered by case
11.
In a similar way one would show that 1 is cov-

ered by case 11.
40
Finally, consider 0: It must be distinguished
from
• 00, since 00 ∈ L, but 0 6∈ L.
• 01, since 01 ∈ L, but 0 6∈ L.
However, compare 0 and 10: Neither is in L,

but after one more input both resulting strings
will have the same last two symbols and then
it is irrelevant what came before.
41
Let
L = x ∈ {0, 1}∗ | x ends with 11


distinguish between
• a string ending with 11, and
• a string not ending with 11.

cases. Let’s call them 11 and N .
42
The decision for any x depends only on its last
two symbols.
Therefore we needn’t distinguish, say,


Can we take this one step further, and say that

it’s sufficient to remember whether the current
prefix ends with 11 or not? In other words, are
cases 11 and N the only ones?
43
Consider the strings x and y, where x ends in
01 and y in 00. Both are in case N .
On input 1,
• x1 ∈ L, but
• y1 6∈ L.
Therefore we have to distinguish between x

and y. Thus case N contains at least two
cases.
44
Do strings ending with 10 also form another
case?
Consider the strings x and y, where x ends in

00 and y in 10.
Regardless of the input symbol, the result will

end in the last two symbols. By then, it will
be irrelevant whether we started off with x or
y.
Thus we don’t have to distinguish between

strings ending in 00 or 10.
45
Finally, there are the strings λ, 0 and 1.
The strings λ and 0 don’t need to be distin-

guished from strings ending in 00, since we
need at least two more input symbols before
the result can be in L. By then, it will be
irrelevant with which we had started.
The string 1 doesn’t need to be distinguished

from strings ending in 01, since after one in-
put symbol the resulting strings will end in the
same two symbols.
46
L=
x ∈ {0, 1}∗ |

x contains an even number of 0’s and

an odd number of 1’s}
We don’t have to remember the entire string

we’ve seen so far.
Steps of abstraction:
1. Important is only the # of 0’s and the # of

1’s, not the way the symbols are arranged,
e.g., 011 and 101 are equivalent.
2. Crucial is whether the # of 0’s, respec-

tively the # of 1’s, is odd or even.
E.g., 011 and 0001111 are equivalent, but

011 and 001111 are not (add a 1 to see
why).
47
Therefore
1. we have four distinct cases, and
2. it is enough to remember which case we

are in currently.
48
L=
x ∈ {0, 1}∗ |

x ends in 1 and
does not contain the substring 00}
Let s denote the string we’ve seen so far.
Case N : s contains 00. s 6∈ L and not the

prefix of any string in L.
s doesn’t contain 00: Here there are three

possible cases.
49
Case 0: Last symbol of s is 0.
1. Next input is 0: go to Case N .
2. Next input is 1: go to Case 1.
Case 1: Last symbol of s is 1. (Note errors in

Edition 2 of textbook.)
Case λ: s = λ.
1. Differs from Case N : E.g., λ01 ∈ L.
2. Differs from Case 0: E.g., next input is

0 - λ0 leads to Case 0, but 00 leads to
Case N .
3. Differs from Case 1: λ 6∈ L, while Case 1

represents all strings which are in L.
50
Therefore
1. we have divided the strings in {0, 1}∗ into

four different types, and
2. in order to recognize strings in L, we need

only remember which of the four types we
have so far.
51
Consider
and
52
We can interpret the diagrams in two ways:
1. as a flowchart of the algorithm we follow

when processing strings, or
2. as specifying an abstract machine.
53
Interpretation as flowchart:
The circles correspond to the distinct cases the

algorithm is keeping track of, or the distinct
types of strings we have divided Σ∗ into.
The label used in each circle describes
1. in the first chart, the parities of the # of

0’s and the # of 1’s in the current string,
or
2. in the second chart, the various cases we

considered regarding the string s.
54
Note the following aspects of such a diagram:
• The short arrow that doesn’t originate at

one of the circles indicates the starting
point—this case includes λ.
• The double circles indicates the case in

which the current substring is in L.
• The arrows starting at each circle indicate,

for each possible input, which case results,
therefore how much info we need to re-
member.
55
Interpretation as abstract machine:
At any time the machine is in one of four pos-

sible states, labeled λ, 0, 1, and N .
Initially, when the machine is activated, it’s in

state λ.
It receives successive inputs of 0 or 1. As a

result of being in a certain state and getting
a certain input, it moves to the state specified
by the corresponding arrow.
Finally, certain states are accepting states.
A string of 0’s and 1’s is in L iff the machine is

in an accepting state as a result of processing
that string.
56
The term abstract machine means that it’s a
specification of the capabilities that the ma-
chine must have.
Important are:
• the set of states, and
• the function that specifies, for each (state,

input symbol)-pair, the state the machine
goes to next.
57
Crucial is that the set of states is finite: the
number of states puts an absolute limit on the
amount of info the machine needs to, or is able
to, remember.
Strings in the language can be arbitrarily long,

but remembering a fixed amount of info is suf-
ficient.
The machine has only one form of memory,

namely being able to distinguish between these
states.
Such a machine can recognize only simple lan-

guages.
58
Finite automata
Definition: A finite automata (short FA) is a

5-tuple (Q, Σ, q0, A, δ ), where
• Q is a finite set of states,
• Σ is a finite alphabet of input symbols,
• q0 ∈ Q is the initial state,
• A ⊆ Q is the set of accepting states,
• δ : Q × Σ → Q is the transition function.
For q ∈ Q, a ∈ Σ, δ (q, a) is the state to which

the FA moves if it is in state q and receives
input a.
Note: order in 5-tuple in Edition 2 wrong

59
Consider again
∗
L = x ∈ {0, 1} | x ends with 10
60
Until this FA gets at least two input symbols,
it remembers exactly what it has received.
Therefore there is a separate state for each
of λ, 0 and 1.
Once it has got two inputs, it remembers the

last two symbols it has seen. Therefore it cy-
cles between the states 00, 01, 11, and 10.
61
q δ (q, 0) δ (q, 1)
λ 0 1
0 00 01
1 10 11
00 00 01
01 10 11
10 00 01
11 10 11
62
We have already seen that we can reduce the
number of states.
Consider the three states 1, 01, and 11. Let

x, y and z be strings that take the FA to these
states.
• x, y and z don’t need to be distinguished

now, since no state is accepting.
• The rows of the states are exactly the

same. For all three, input 0 takes the FA to
state 10, and input 1 takes it to state 11.
Thus, after one input, we won’t be able to
distinguish between x, y and z anymore.
These three states do not represent three cases

that need to be distinguished. We merge them
into one state, called B.
63
Consider the three states 0, 00 and 10.
Their rows are identical.
However, 10 is an accepting state, while the

other two are not. Therefore we cannot simply
merge all three.
However, we can merge the two states 0 and

00. Call the resulting state A.
q δ (q, 0) δ (q, 1)
λ A B
A A B
B 10 B
10 A B
The total number of states is now four.

64
Now consider the states λ and A:
• Not one is an accepting state.
• Their rows are the same.
We can thus include λ in A.
q δ (q, 0) δ (q, 1)
A A B
B 10 B
10 A B
The total number of states is now three, and

this cannot be reduced any further.
65
Consider
We could say that
• state A represents ‘no progress toward 10’,

(note error in Edition 3) since the FA has
received either no input at all or an input
string ending with 0, but not ending with
10,
• state B represents ‘halfway there’, since

the FA has received an input string end-
ing with 1.
We’ll later have a systematic procedure for

minimizing the number of states in a given FA.
66
We have
δ :Q×Σ→Q ,
the state to which the FA goes, if it is in state
q and receives input a.
We want
δ ∗ : Q × Σ∗ → Q ,
the state in which the FA ends up, if it begins
in state q and receives the string x of input
symbols.
We’ll develop a recursive definition:
• define δ ∗ (q, λ),
• assume we know what δ ∗ (q, y ) is,
• for a ∈ Σ, define δ ∗ (q, ya)
67
• Basis: input string is λ.
The FA shouldn’t change state as result of

getting λ, thus ∀q ∈ Q, δ ∗ (q, λ) = q.
• Assume we know what δ ∗ (q, y ) is.
• Consider δ ∗ (q, ya): It’s the state that the

FA reaches when it begins in q and receives
first input string y, then symbol a.
The FA is in δ ∗ (q, y ) after reading y. By

definition, for every p ∈ Q, the state to
which the FA moves from p on input a is
δ (p, a).
Thus δ ∗ (q, ya) = δ (δ ∗ (q, y ) , a).
68
Definition
Let M = (Q, Σ, q0, A, δ ) be an FA. We define
δ ∗ : Q × Σ∗ → Q
as follows:
• ∀q ∈ Q, δ ∗ (q, λ) = q.
• ∀q ∈ Q, y ∈ Σ∗, and a ∈ Σ,
δ ∗ (q, ya) = δ δ ∗ (q, y ) , a

In other words, δ ∗ (q, x) processes the symbols

of x one at a time, using δ to move from one
state to the next.
69
Example
δ ∗ (q, abc) = δ δ ∗ (q, ab) , c

= δ δ δ ∗ (q, a) , b , c

∗
= δ δ δ (q, λa) , b , c
∗
= δ δ δ δ (q, λ) , a , b , c
= δ (δ (δ (q, a) , b) , c)
= δ (δ (q1, b) , c)
= δ (q2, c)
= q3
70
Note that as part of the calculation above, we
worked out
δ ∗ (q, a) = δ ∗ (q, λa)

∗
= δ δ (q, λ) , a

= δ (q, a)
Therefore, for strings of length 1, δ and δ ∗ can

be used interchangeably.
Other property that δ ∗ satisfies:

∗ ∗ ∗ ∗
∀q ∈ Q, x, y ∈ Σ , δ (q, xy ) = δ δ (q, x) , y

71
Definitions
Let M = (Q, Σ, q0, A, δ ) be an FA. A string

x ∈ Σ∗ is accepted by M if
δ ∗ (q0, x) ∈ A .
If a string is not accepted, it is rejected by M .
The language accepted or recognized by M , is

the set
L (M ) = {x ∈ Σ∗ | x is accepted by M }
If L is any language over Σ, L is accepted, or

recognized, by M if and only if L = L (M ).
72
Note the if and only if in the last statement.
It is not sufficient to say that L is accepted by

M if every string in L is accepted by M .
That would mean that the FA that accepts Σ∗

would accept any conceivable language over Σ.
The power of any abstract machine, such as

an FA, does not lie in the number of strings
that it accepts, but in its ability to discrimi-
nate between strings, and accept some, while
rejecting others.
To accept a language L, an FA must accept

all the strings in L and reject all the strings in
Σ∗ \ L.
73
Theorem
A language L over the alphabet Σ is regular if

and only if there is an FA that accepts L.
In other words,
• if M is an FA, then there is a regular expres-

sion corresponding to the language L (M ),
and
• if r is a regular expression, there is an FA

that accepts the language corresponding
to r.
There exist algorithms for constructing the

regular expression of the first case, and the
FA of the second case. We won’t cover that
in the course.
In the following we’ll do two examples of each

direction.
74
Consider the following FA with initial state A:
We want the regular expression corresponding

to the language L accepted by the FA.
We notice that there are two accepting states,

namely A and B.
75
• State A:
– it’s also the initial state, therefore λ ∈ L.
– the only other strings x for which

δ ∗ (A, x) = A are concatenations of 00.
This corresponds to (00)∗.
• State B:
– The only way to reach B from the initial

state A without looping through A is
with 11.
– δ ∗ (B, x) = B iff x consists of copies

of 11.
This corresponds to 11(11)∗.
It is not possible to reach A or B in any other

way. Therefore the regular expression is
(00)∗ + (00)∗11(11)∗ = (00)∗(11)∗ .
76
We want the regular expression for the lan-
guage L accepted by the following FA M :
77
We notice the following:
• Regardless of which state we are in, the

input b takes us to state B. Thus any string
ending in b takes us to state B.
• From state B, the input aaa takes us to

state E, the accepting state.
Therefore, if a string ends in baaa, it is ac-

cepted.
The only way to reach E is from D with input

a; the only way to reach D is from C with input
a; the only way to reach C is from B with input
a. The only way to reach B is with input b;
this can happen in any state. Thus any string
accepted by M must end in baaa.
Therefore the regular expression is

(a + b)∗baaa .
78
Consider
(a + b)∗ (ab + bba) (a + b)∗

We want the FA that accepts the correspond-
ing language L, the set of all strings over {a, b}
that contain at least one of ab and bba.
We note
• ab and bba should be accepted,
• if a string x is accepted, then xy, y ∈ {a, b}∗

should be accepted.
79
80
At states p, r and s, we need transitions la-
beled a, a, and b, respectively. We may need
additional states.
• state p, input a: let’s think of p as the non-

accepting state we are in if the last input
was a. If we are in p and the next input
is a, no progress towards a result has been
made; thus δ (p, a) = p.
• state r, input a: let’s think of r as the non-

accepting state we are in if the last input
was b, and that b wasn’t preceded by a b.
If we now get an a, the last b becomes
worthless. By our informal definition of p,
δ (r, a) = p.
• state s, input b: let’s think of s as the non-

accepting state we are in if the string thus
far ended in bb. If we are in s and the next
input is b, no progress towards a result has
been made; thus δ (s, b) = s.
81
82
Consider
(11 + 110)∗0
We want the FA that accepts the correspond-
ing language L.
• λ 6∈ L, therefore q0 cannot be an accepting

state.
• 0 ∈ L, therefore 0 must take the FA to an

accepting state.
• 1 6∈ L
Moreover, we cannot lump 1 and λ into

one case, since there is at least one string
that discriminates between them:
1 110 6∈ L, but λ 110 ∈ L
83
L doesn’t contain
• strings beginning with 0, apart from 0 it-

self, or
• strings beginning with 10.
Therefore we introduce a state s that repre-

sents all strings that cannot be the prefix of
an element of L. s is a trap state.
84
Consider state r, input 1: The FA should not
• stay in r, since we need to differentiate be-

tween 1 and 11 -
1 10 ∈ L, but 11 10 6∈ L.
• return to q0, since we need to differentiate

between λ and 11 -
λ 00 6∈ L, but 11 00 ∈ L.
Therefore we need a new state, say t.

85
Consider state t, input 0:
110 ∈ L, therefore the FA must go to an ac-

cepting state.
Can it go to state p, or do we need another

accepting state?
Suppose it went to p. Then it wouldn’t be able

to distinguish anymore between strings starting
with 0 and those starting with 110. But
00 6∈ L while 1100 ∈ L
Therefore we need another accepting state,
say u, and δ (t, 0) = u.
86
Consider
• state u, input 0: The FA is in the same

situation as after reading 0 only - 1100 ∈ L
but not the prefix of any longer string in
L.
Therefore δ (u, 0) = p.
• state t, input 1: We can interpret t as in-

dicating the end of a copy of 11. If we get
another 1, we can see it as the start of the
next copy of 11.
Therefore δ (t, 1) = r.
• state u, input 1: We can interpret u as

indicating the end of a copy of 110. If we
get another 1, we can see it as the start of
the next copy of 110.
Therefore δ (u, 1) = r.
87
88
We continued to add states as long as it was
necessary. We stopped as soon as we were
able to define the required transitions in a way
that involved only the states we had already.
It is only due to our theorem (a language is

regular iff accepted by an FA) that we know
that the addition of states will stop.
If our language weren’t regular, we wouldn’t

be able to stop.
Perhaps the most difficult part of finding an FA

for a given language is determining whether or
not a new state is needed.
89
Distinguishing one string from another
It is possible to use a finite automata to rec-

ognize an infinite language,
• if we can group input strings in such a way

that strings in the same group do not need
to be distinguished from each other,
• which will make it unnecessary for the FA

to remember every input string.
The number of states in an FA
depends on
the number of distinct strings that must be

distinguished from each other.
90
Definitions
Let L be a language in Σ∗, and x ∈ Σ∗. Let
L/x = {z ∈ Σ∗ | xz ∈ L} .
Two strings x and y in Σ∗ are called distin-
guishable w.r.t. L if L/x 6= L/y.
Any string z ∈ Σ∗ that is in one of the two sets

but not the other, i.e., for which
xz ∈ L and yz 6∈ L, or vice versa

is said to distinguish x and y w.r.t. L.
If
L/x = L/y ,
x and y are indistinguishable w.r.t. L.
In other words, x and y are indistinguishable

w.r.t. L if for every z, either both xz and yz
are in L, or neither xz nor yz is in L.
91
In order to show that two strings x and y are
distinguishable, it is sufficient to find one z so
that either
xz ∈ L and yz 6∈ L
or
xz 6∈ L and yz ∈ L .
In order to show that two strings x and y are

indistinguishable, you must be able to show
that no such z exists. This is usually much
more difficult than the previous case.
92
Example
Let
∗
L = s ∈ {0, 1} | s ends with 10
• Let x = 01011 and y = 100.
Consider z = 0.
Then xz = 010110 ∈ L, but yz = 1000 6∈ L.

In other words, z ∈ L/x, but z 6∈ L/y. Thus
x and y are distinguishable.
• Let x = 0 and y = 100.
For any z that ends with 10, xz ∈ L and

yz ∈ L. For any z that doesn’t end in
10, xz 6∈ L and yz 6∈ L. In other words,
L/x = L = L/y. Thus x and y are indistin-
guishable w.r.t. L.
93
Another example:
Let
n o
L= 0k 1k | k ≥ 1 .
We can show that there is an infinite set of

strings in {0, 1}∗, any two of which are distin-
guishable with respect to L.
Let S = {0, 00, 000, . . .}.

Let x and y be two distinct elements of S.
Then x = 0i, i ≥ 1,
and y = 0j , j ≥ 1, j 6= i.
Let z = 1i.
Then xz = 0i1i ∈ L
and yz = 0j 1i 6∈ L.
94
Consider the language
∗
L = x ∈ {0, 1} | x ends with 01 .
The strings λ, 0, and 01 are pairwise distin-

guishable with respect to L.
• λ and 01:
λλ 6∈ L, but 01λ ∈ L
• 0 and 01:
0λ 6∈ L, but 01λ ∈ L
• λ and 0:
λ1 6∈ L, but 01 ∈ L
Therefore, λ, 0, and 01 are pairwise distin-

guishable.
95
We can put a lower bound on the memory re-
quirements of any FA that is capable of recog-
nizing a given language L.
Before we can prove the corresponding theo-

rem, we first need a lemma.
96
Lemma
Suppose that L ⊆ Σ∗ and M = (Q, Σ, q0, A, δ )

is an FA recognizing L. If x and y are strings
in Σ∗ that are distinguishable w.r.t. L, then
δ ∗ (q0, x) 6= δ ∗ (q0, y ).
Proof
x and y are distinguishable w.r.t. L.
Then there is a z such that z is in exactly one

of L/x and L/y.
Then exactly one of xz and yz is in L.
Then exactly one of δ ∗ (q0, xz ) and δ ∗ (q0, yz ) is

accepting.
Then
δ ∗ (q0, xz ) 6= δ ∗ (q0, yz )
97
However,
∗ ∗ ∗
δ (q0, xz ) = δ δ (q0, x) , z

∗ ∗ ∗
δ (q0, yz ) = δ δ (q0, y ) , z

Thus
δ ∗ (q0, x) 6= δ ∗ (q0, y )
98
Theorem
Suppose that L ⊆ Σ∗ and, for some positive

integer n, there are n strings in Σ∗, pairwise
distinguishable w.r.t. L. Then every FA recog-
nizing L must have at least n states.
Proof
Suppose x1, x2, . . . , xn are n strings, pairwise

distinguishable w.r.t. L.
Suppose M = (Q, Σ, q0, A, δ ) is an FA with

fewer than n states.
The states δ ∗ (q0, x1) , δ ∗ (q0, x2) , . . . , δ ∗ (q0, xn)

cannot all be distinct. Therefore,
for some
i 6= j, δ ∗ (q0, xi) = δ ∗ q0, xj .
Since xi and xj are distinguishable w.r.t. L, M

cannot recognize L.
99
Example
Suppose that n ≥ 1, and let

Ln =
x ∈ {0, 1}∗ |

|x| ≥ n and
the nth symbol from the right in x is 1}
Naive approach: we create a distinct state for
every possible substring of length n or less, i.e.,
for λ, 0, 1, 00, 01, 11, 10, 000, . . ..
Then the FA can remember the last n symbols

of the current input string.
The total number of states is:

Pn
i=0 number of strings of length i
Since the input alphabet has only two symbols,

the number of strings of any length i is 2i.
Then the number of states is 2n+1 − 1.

100
The (incomplete) FA for L3:
The FA has 23+1 −1 = 15 states. The number

of states can be reduced.
101
We’ll show that any FA recognizing Ln must
have at least 2n states.
Let x and y be two distinct strings of length n.
Then there must be at least one i, 1 ≤ i ≤ n,

such that x and y differ in position i. Mark that
position. To the right of the marker there are
n−i
symbols. Now choose any string z ∈ {0, 1}∗ of
length i − 1.
Then
|xz| = |yz| = n + i − 1 .
To the right of the marker there are now
(n − i) + (i − 1) = n − 1
symbols. Thus the marked position is the nth
from the right.
102
Since the strings xz and yz differ in this po-
sition, one of them must have a 1 in the nth
position from the right, and the other a 0.
Therefore one is in Ln, and the other not.

Thus x and y are distinguishable w.r.t. Ln.
There are 2n strings of length n. We have

shown that any two of them are distinguish-
able w.r.t. Ln. By our theorem every FA that
recognizes Ln has at least 2n states.
103
Recall the theorem: Suppose that L ⊆ Σ∗
and, for some positive integer n, there are n
strings in Σ∗, pairwise distinguishable w.r.t. L.
Then every FA that recognizes L has at least
n states.
Let L be any language: if for every positive in-

teger n, there are n strings which are pairwise
distinguishable w.r.t. L, then L cannot be rec-
ognized by an automaton with a finite number
of states.
104
Often we
• don’t show that, for every n ≥ 1, there are

n pairwise distinguishable strings,
• but instead that there is an infinite number

of pairwise distinguishable strings.
After all, if we have an infinite set S of pair-

wise distinguishable strings, then, for any n,
any subset of S of size n has n pairwise distin-
guishable elements.
It is sensible to choose a subset of S for which

it is easy to prove that all strings are pairwise
distinguishable.
105
Example:
Let L = ww | w ∈ {0, 1}∗ .

Let S = {0, 00, 000, . . .}.

Let x, y ∈ S, x 6= y.
Then x = 0i and y = 0j , j 6= i.
Let z = 10i1.
Then xz = 0i10i1 ∈ L, but yz = 0j 10i1 6∈ L.
Therefore L is not regular.
106
Moreover, sometimes we
• don’t show that there is an infinite number

of pairwise distinguishable strings,
• but that all strings over the alphabet are

pairwise distinguishable.
Example:
Let Lpal be the language of palindromes over

{0, 1}, i.e., the set of strings of 0’s and 1’s that
are identical to their reverse. E.g., 0, 0110,
010 are in Lpal, but 1101 is not.
Lpal is very simple to describe, but we can

show that we need to distinguish all strings
over {0, 1}. Therefore we need much more
memory to recognize it than that offered by
any FA.
107
For Lpal we need to distinguish all substrings:
after processing any string x, we must remem-
ber enough to distinguish x from every other
string, say y.
For every x and y, there is at least one possible

string z of subsequent inputs that would cause
us to make one decision for x and the opposite
for y.
In other words, for any two strings x and y,

there is a string z so that exactly one of xz
and yz is in Lpal.
108
Theorem:
Lpal cannot be accepted by any FA, and there-

fore is not regular.
Proof:
Let x, y ∈ {0, 1}∗, x 6= y. We’ll show that x and

y are distinguishable w.r.t. Lpal.
We consider two cases.
• |x| = |y|:
Let z = xr .
Then xz = xxr ∈ Lpal, but yz = yxr 6∈ Lpal.
109
• |x| 6= |y|:
Assume |x| < |y|, and write y = y1y2, with

|y1| = |x|.
Let w ∈ {0, 1}∗. For z = wwr xr , xz =

xwwr xr ∈ Lpal.
Now narrow down the choice of w: Let

|w| = |y2|, but w 6= y2.
Consider yz = y1y2z = y1y2wwr xr .
By choice, |y1| = |xr |, and |y2| = |w| = |wr |.
yz ∈ Lpal ⇒ wr = y2r ⇒ w = y2. Contradic-

tion.
We have therefore shown that Lpal is not

regular.
110
The concept of distinguishability gives us
• a method for showing that a given lan-

guage is not regular. There are other
methods, e.g., the so-called pumping
lemma for regular languages.
• a tool for finding the FA with the least

possible states that recognizes a given lan-
guage.
111
Unions, Intersections, Complements
Suppose both L1 and L2 are regular languages

over an alphabet Σ.
We know
• there are FAs M1 and M2 accepting L1 and

L2, respectively,
• L1 ∪ L2, L1L2, and L1∗ are regular and can

therefore be accepted by FAs.
Given M1 and M2, can we produce new FAs

that accept the three new languages?
At this point we’ll consider the union, from

which we can also derive methods for intersec-
tion and difference.
112
Consider union:
Suppose
M1 = (Q1, Σ, q1, A1, δ1)

and
M2 = (Q2, Σ, q2, A2, δ2) .
A machine M can decide whether a string
x ∈ L1 ∪ L2
if it has enough information at each step to
decide separately whether
x ∈ L1
and whether
x ∈ L2 .
113
It needs to
• keep track of M1 and M2 simultaneously,
• therefore remember at each step the cur-

rent state of both,
• therefore remember the ordered pair (p, q ),

where p ∈ Q1, q ∈ Q2.
114
Thus, the set of states is
Q1 × Q2 .
Then the initial state must be
(q1, q2) .
Accepting states? A string x should be ac-

cepted if it is in L1 or L2, thus a state (p, q )
should be accepting if
p ∈ A1 or q ∈ A2 .
Transition function? If the FA is in (p, q ) and

receives input a, it should move to
(δ1 (p, a) , δ2 (q, a)) ,

since
δ1 (p, a) and δ2 (q, a)
are the states to which M1 and M2 would
move.
115
Theorem
Suppose that
M1 = (Q1, Σ, q1, A1, δ1)

and
M2 = (Q2, Σ, q2, A2, δ2) .
accept languages L1 and L2, respectively.
Let M = (Q, Σ, q0, A, δ ), where
Q = Q1 × Q2 ,
q0 = (q1, q2) ,
for p ∈ Q1, q ∈ Q2, and a ∈ Σ,
δ ((p, q ) , a) = (δ1 (p, a) , δ2 (q, a)) ,
A = {(p, q ) | p ∈ A1 or q ∈ A2} .
Then M accepts the language L1 ∪ L2.

116
Example
Let L1 and L2 be languages over {0, 1}, where

L1 = {x | 00 is not a substring of x}
L2 = {x | x ends with 01} .
Corresponding FAs:
117
L1 ∪ L2 is recognized by
Mu = (Qu, Σ, qu, Au, δu), where
Qu = {A, B, C} × {P, Q, R}
= {(A, P ) , (A, Q) , (A, R)} ∪
{(B, P ) , (B, Q) , (B, R)} ∪
{(C, P ) , (C, Q) , (C, R)}
•
qu = (A, P )
Au = {(A, P ) , (A, Q) , (A, R)} ∪

{(B, P ) , (B, Q) , (B, R)} ∪
{(C, R)}
118
•
δu :
q δu (q, 0) δu (q, 1)
(A, P ) (B, Q) (A, P )
(A, Q) (B, Q) (A, R)
(A, R) (B, Q) (A, P )
(B, P ) (C, Q) (A, P )
(B, Q) (C, Q) (A, R)
(B, R) (C, Q) (A, P )
(C, P ) (C, Q) (C, P )
(C, Q) (C, Q) (C, R)
(C, R) (C, Q) (C, P )
States (A, Q), (B, P ) and (B, R) cannot be

reached, therefore we can leave them out.
119
120
Consider intersection:
A string x should be accepted if it is in L1 and

L2, thus a state (p, q ) should be accepting if
p ∈ A1 and q ∈ A2 .
Therefore, in our previous theorem, we only

need to define
A = {(p, q ) | p ∈ A1 and q ∈ A2}

to have M accepting the language L1 ∩ L2.
Example (cont.):
Ai = {(A, R) , (B, R)}
121
122
Consider the complement L1 \ L2:
A string x should be accepted if it is in L1,

but not in L2, thus a state (p, q ) should be
accepting if
p ∈ A1 and q 6∈ A2 .
Therefore, in our previous theorem, we only

need to define
A = {(p, q ) | p ∈ A1 and q 6∈ A2}

to have M accepting the language L1 \ L2.
Example (cont.):
Ac = {(A, P ) , (B, P ) , (A, Q) , (B, Q)}
123
124
Special case: L1 = Σ∗.
Two methods:
• Use
M1 = ({q1} , Σ, q1, {q1} , δ1)

where
∀a ∈ Σ, δ1 (q1, a) = q1
and proceed as above.
• Notice that a string x should be accepted

if it is not in L2. Therefore L02 will be
accepted by
M20 = (Q2, Σ, q2, Q2 \ A2, δ2)
125

FLA Chapter 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FLA Chapter 3

Uploaded by

Copyright:

Available Formats

Regular languages

Definition: Let Σ be an alphabet. The class

3. For each a ∈ Σ, {a} is an element of R.

4. If L1 and L2 are any elements of R, then

(b) L1L2 is an element of R.

(c) L1∗ is an element of R.

Only languages that can be obtained by using

Let Σ = {0, 1}. Some regular languages over

4.(a) {0} {1} = {0, 1}

(b) {0} {1} = {01}, {1} {0} = {10}

(c) {0}∗, {1}∗

1. {0} {10} = {0, 10}

2. {0} {01} = {001}

3. ({1} {λ}) {001}

5. {10, 111, 11010}∗

• leave out {} or replace with (), and

to obtain a regular expression.

Take care not to confuse the two notations—

Some regular expressions are:

5. (10 + 111 + 11010)∗

6. (0 + 10)∗ (11∗ + (001 + λ))

Definition: Let Σ be an alphabet. The class

3. For each a ∈ Σ, a is an element of RE.

4. If r1 and r2 are any elements of RE, then

(a) r1 + r2 is an element of RE.

(b) r1r2 is an element of RE.

(c) r1∗ is an element of RE.

Only expressions that can be obtained by using

If we form regular expressions strictly accord-

if r1 and r2 are regular expressions, then r1 =

We won’t study the algebra of regular expres-

(0 + 1)∗01(0 + 1)∗ + 1∗0∗ = (0 + 1)∗

Any string in L can be divided into a number

Any concatenation of strings of even length

Therefore we can write

which corresponds to the regular expression

We can also write

Any string in L must contain at least one 1,

There is an even number of further 1’s, each

Therefore we can represent L by

We can also concentrate on the last 1 in the

or on a 1 in the middle, and make sure it has

The regular expression must be general enough

Note: In this course, don’t spend a lot of time

is not general enough, since it doesn’t include

Adding 0∗ at the beginning to obtain

A regular expression for L is

Let’s first describe the set of strings of length

To include strings of length less than 6, we

x ends with 1 and

We want a regular expression for L.

A string does not contain the substring 00

Every string in L ends with 1

Therefore, (1 + 01)∗ is too general.

Possible fix - add 1 at the end - (1 + 01)∗1

Not correct, since we now exclude 01.

Add that possibility: (1 + 01)∗ (1 + 01)

Therefore, final answer is (1 + 01)+ .

and d (for ‘digit’) denote

An identifier in C is any string of length 1 or

Written as regular expression:

• s (for ‘sign’) denote λ + a + m, and

sd+(pd+ + pd+Esd+ + Esd+)