You are on page 1of 7

Module 2 contd: Pumping Lemma for Regular Languages,

Closure Properties of Regular Languages


Lecture notes by Dr. K S Sudeep

 What are the languages for which we *cannot* design a DFA or an NFA?

Some examples of non-regular languages:

(i) {w: w contains an equal number of 0s and 1s}


Lequal = {ε, 01, 10, 1100, 0011, 0101, 1010, 1001, 0110, 110010, …}
(ii) {w: w is a palindrome (a string that reads same when read
backwards)}
Lpalindrome = {ε, 0, 1, 00, 11, 101, 010, 11011, 00100, 10101, 01010, …}
(iii) {w of the form 0n 1n: n is a non-negative integer}
It means strings in which there are n 0s followed by n 1’s, n ≥ 0.
L0n1n = {ε, 01, 0011, 000111, 00001111, …}
(iv) {Strings of the form w#wR where wR is w in reverse order}
Here, the input alphabet contains 3 symbols. Ʃ = {0, 1, #}. The
symbol ‘#’ works as a separator between a string w and its reverse
string (or mirror reflection) wR.
LwwR = {#, 0#0, 01#10, 1001#1001, 011#110, …}

 So, how do we decide if a language is regular or not? For checking this,


there is a simple technique / theorem that we can use. It is called the
Pumping Lemma for regular languages. Roughly, it says that if a language
is regular, then every long string w in that language has a part that can be
‘pumped’ (repeated, like in a loop) any number of times.
Pumping Lemma for regular languages: (Statement)

If L is a regular language, then there exists a positive integer constant n such that
for every string w in L with length at least n, we can break w into 3 strings, i.e.,
w = x ◦ y ◦ z (or w = xyz), such that:

(i) y is not the empty string,


(ii) |x ◦ y| ≤ n (length of the first 2 parts combined is at most n), and
(iii) for all integer k ≥ 0, the string x ◦ yk ◦ z is also in L. (we can remove y or
‘pump’ y many times, and all such strings are also in L).
(Note that n depends only on L, not on the string w.)

We will do a proof later if time permits. First, let us try to understand this, and
verify that this holds for regular languages:

a. String of zeroes, with total length even.


n = 2 (fixed for the language, it cannot depend on the string).
Now, consider any string w in this language, of length at least 2.
These are 00, 0000, 000000, ….
First, let us take w = 00. How to break it into x, y and z?
Let us try with x = 0, y = 0, z = ε. It does not work, as condition 3 fails.
But we can indeed write it w = 00 as ε ◦ 00 ◦ ε
Here, x = ε, y = 00, z = ε.
Verify the following:
(i) y is 00, not the empty string
(ii) String xy is 00, whose length is 2. n = 2. So |xy| ≤ n.
(iii) For all integer k ≥ 0, the string x ◦ yk ◦ z is also in L. Why?
k = 0, we get x ◦ z, this is empty string, or ε. It is in L.
k = 1, x ◦ y1 ◦ z = 00, it is w itself, we know it is in L.
k = 2, x ◦ y2 ◦ z = 0000, it is also in L.
Likewise, for any non-negative integer k, x ◦ yk ◦ z is in L.

Let us try with another string, w = 000000. We can write w = ε ◦ 00 ◦ 0000


x = ε, y = 00, z = 0000.
Convince yourself that here also, all the above statements hold true.

b. Strings that end in a 1. (Alphabet is {0, 1}).


The corresponding regular expression is (0+1)* 1.
Lendin1 = {1, 01, 11, 001, 011, 101, 111, 0001, 0011..}
Can n be 1 for this language?
Important: Remember that ‘y’ cannot be empty string.
And for every w of length at least n, the lemma conditions must hold.
i.e., we can write w = xyz such that |xy| ≤ n and for any non-negative
integer k, x ◦ yk ◦ z is in L. (yk means y repeated k times).
Since empty string ε is not in L, we can say that n cannot be 1. Why?
If n = 1, w = ‘1’ is a string of length at least n.
If we try to write w = xyz, y must be the string ‘1’ (since y cannot be ε)
This means x = z = ε.
For k = 0, x ◦ yk ◦ z becomes the empty string, which is not in L.
 Can n be 2? (w = 01, 11, 001, 011, ..) w = 1 x =e, y =’1’, z =e
 01 = x y z x = empty string, y =0, z =1
Any string w in L of length at least 2: Can we split it as xyz, such that for
any non-negative integer k, we can say that x ◦ yk ◦ z is in L?
Take w = 01, put x = ε, y = 0, z = 1. Length of xy = 1, less than 2.
k = 0, we get ‘1’, it is in L
k =2, we get ‘001’, it is also in L
k = 5, we have 000001, it is also in L..
You may verify that in a similar manner, we can split any w in L of length
at least 2 can be split to x, y and z such that all these will hold. (Try for
strings like w = ‘11’ and w = ‘1011’)

Note: n can be understood as the minimum number of states in a DFA


that recognizes the language L. (We will see more on this when we do a
proof of Pumping Lemma).

 Another interesting example: The language {0} (with a single string ‘0’ in
it) is regular. How does pumping lemma work on this language?
(Hint: Take n to be 2 in this case, then the Lemma holds true!)

How do we use pumping lemma to show that a language L is *not* regular?

1. Let us first try {w of the form 0n 1n: n is a non-negative integer}


L0n1n = {ε, 01, 0011, 000111, 00001111, …}

To show that L0n1n is not regular, it is enough to prove the below claim.

Claim: No matter what is the value you fix for n, all strings w in this
language of length at least n cannot be written as w = x ◦ y ◦ z such that
the pumping lemma properties hold true (we need not check it for all
values of k. It is enough to check for k = 0 and k = 2 in most cases.
i.e., (i) y is not empty string, (ii) length of x ◦ y is at most n, and (iii) x ◦ z
and x ◦ y ◦ y ◦ z are also in L.

Proof: No matter what is the value of n, let us take w as the string with n
zeroes followed by n 1’s. It is easy to see |w| ≥ n, (as |w| = 2n) and that
w is in L. Now, let us try to write w = x ◦ y ◦ z, with y as a non-empty string.
Case 1: If y contains only 0s, then for k = 0, the string x ◦ z (that we get on
removing ‘y’ part from w) cannot be in L.
Case 2: If y contains some 1’s: say y = ‘01’ or ‘0011’, then we have that
|x ◦ y| > n, as y goes to the second half of the string w. So we can rule this
out. (because the first 2 parts itself is longer than n)

2. {Strings of the form w#wR where wR is w in reverse order}


LwwR = {#, 0#0, 01#10, 1001#1001, 011#110, …}
This one is easy. No matter what is the value of n, if we take y to be
completely on the left or right of the special symbol ‘#’, it is obvious that
if we remove the string y from in between, x ◦ z does not have the same
number of characters on both the sides of ‘#’. Now if y is in the middle
and has ‘#’ in it, then after removal of y, the string does not contain the
symbol ‘#’, so it is not in L.

3. Another example: {w: w contains an equal number of 0s and 1s}

Claim: No matter what is the value of n, all strings w in this language of


length at least n cannot be written as w = x ◦ y ◦ z such that y is not empty
string, |x ◦ y| ≤ n, and strings x ◦ z and x ◦ y ◦ y ◦ z are also in L.

Proof: Irrespective of what is the value of n, take w to be the string n


zeroes followed by n 1’s. w = 00000….001111……11.

Here also, if we have to write w = x ◦ y ◦ z such that y is not empty string and
also|x ◦ y| ≤ n, then the only possible case is that y contains only 0s. Obviously,
this means x ◦ z and x ◦ y ◦ y ◦ z are not in L.
Exercise: Try proving that Lpalindrome is not regular.

Closure Properties of Regular Languages

Theorem: Union of two regular languages is a regular language. (Or, regular


languages are closed under union).

Proof idea: It is enough to show that if you have a DFA (or NFA) A for a language
LA and another DFA (or NFA) B for language LB, then we can design an NFA C that
recognizes LA U LB.

[Proof Details: We already did this earlier, while showing that we can build an
NFA that recognizes the language of A + B from NFA for A and NFA for B].

Theorem: Regular languages are closed under concatenation.

[Proof: This is also done when we showed that we can build an NFA for A ◦ B if
we have NFAs for expressions A and B].

Theorem: Regular languages are closed under Kleene Closure (star operation).

[Proof: This is also done already -- we showed that we can build an NFA for A*
if we have an NFA for expression A].

Theorem: Regular languages are closed under intersection.

Proof Idea: This is also simple, it is enough to show that one can build a DFA for
the language A ∩ B if we have DFAs for A and for B separately.

Proof details:

Let DA (QA, Ʃ, 𝛿A, q0A, FA) be the DFA for A and DB (QB, Ʃ, 𝛿B, q0A, FA) be the DFA
for B. Note that the input alphabet is same for both of them.

Build the new DFA DA ∩ B as follows.


The new set of states Q’ = QA x QB (cross product of the two sets of states). i.e.,
for every state q in QA and r in QB, (q, r) is a state in Q’.

The transition function 𝛿 of DA ∩ B works exactly like the transition functions of


individual DFAs running ‘together’: i.e.,

𝛿 ((q, r), a) = (𝛿A (q, a), 𝛿B (r, a)).

A pair (q, r) in Q’ is an accept state if q is an accept state in DA and r is an accept


state in DB. In other words, (q, r) ∈ F if q ∈ FA and r ∈ FB.

Theorem: Regular languages are closed under complementation. (i.e., If A is a


regular language, then its complement Ᾱ is also a regular language).

(Proof idea: This we have seen already. We only have to interchange the accept
states and non-accept states in the DFA of A to get a DFA for Ᾱ ).

We can add more such theorems, but they would follow from the closure
properties we have given here. (For example, regular languages are closed under
set difference. i.e., A \ B is regular if A and B are regular. Note that A \ B is same
as A ∩ ).

You might also like