You are on page 1of 92

Non-regular languages

n n
{a b : n  0}
Non-regular languages
R
{ww : w {a, b}*}

Regular languages
a *b b*c  a
b  c ( a  b) *
etc...
How can we prove that a language L
is not regular?

Prove that there is no DFA that accepts L

Problem: this is not easy to prove

Solution: the Pumping Lemma !!!


• DFA with 4 states

b
b b

a b b
q1 q2 q3 q4
a a
In walks of strings: a no state
aa is repeated
• aab

b
b b

a a b
q1 q2 q3 q4
a a
In walks of strings: aabb a state
bbaa is repeated

abbabb
abbbabbabb...
b
b b

a a b
q1 q2 q3 q4
a a
In walks of strings: aabb a state
bbaa is repeated

abbabb
abbbabbabb...
b
b b

a a b
q1 q2 q3 q4
a a
If string w has length | w |  4 :

Then the transitions of string w
are more than the states of the DFA

Thus, a state must be repeated


b
b b

a a b
q1 q2 q3 q4
a a
In general, for any DFA:

String w has length  number of states

A state q must be repeated in the walk of w

walk of w
...... q ......
Repeated state
The Pumping Lemma
Take an infinite regular language L

DFA that accepts L

m
states
Take string w with w L

There is a walk with label w:

.........
walk w
If string w has length | w |  m number
of states
of DFA
then, from the pigeonhole principle:

a state q is repeated in the walk w

...... q ......
walk w
Let q be the first state repeated

...... q ......
walk w
Write w x y z

...... q ......

x z
Observations: length | x y |  m number
of states
length | y | 1 of DFA

...... q ......

x z
Observation: The string xz
is accepted

...... q ......

x z
Observation: The string xyyz
is accepted

...... q ......

x z
Observation: The string xyyyz
is accepted

...... q ......

x z
i
In General: The string xy z
is accepted i  0, 1, 2, ...

...... q ......

x z
In General: i  0, 1, 2, ...
xy z  L
i

The original language

...... q ......

x z
In other words, we described:

The Pumping Lemma !!!


The Pumping Lemma:

• Given a infinite regular language L


• there exists an integer m

• for any string w L with length | w |  m


• we can write w x y z

• with | x y |  m and | y |  1
i i  0, 1, 2, ...
• such that: xy z  L
Theorem: The language
n n
L  {a b : n  0}
is not regular

Proof: Use the Pumping Lemma


n n
L  {a b : n  0}

Assume for contradiction


that L is a regular language

L
Since is infinite
we can apply the Pumping Lemma
n n
L  {a b : n  0}

Let m be the integer in the Pumping Lemma

Pick a string w such that: w  L


length | w| m

m m
We pick wa b
m m
Write: a b xyz

From the Pumping Lemma


it must be that length | x y |  m, | y | 1
m m
m m
xyz  a b  a...aa...aa...ab...b
x y z
k
Thus: y  a , k 1
m m k
x y za b y  a , k 1

i
From the Pumping Lemma: xy z  L
i  0, 1, 2, ...

2
Thus: xy z  L
m m k
x y za b y  a , k 1

2
From the Pumping Lemma: xy z  L

mk m
2
xy z  a...aa...aa...aa...ab...b  L
x y y z

m k m
Thus: a b L
m k m
a b L
For all k > = 1

n n
BUT: L  {a b : n  0}

m k m
a b L

CONTRADICTION!!!
Therefore: Our assumption that LL
is a regular language is
not true

Conclusion: L is not a regular language


n n
Non-regular language {a b : n  0}

Regular languages
R
Non-regular languages L  {ww : w  *}

Regular languages
Theorem: The language
R   {a, b}
L  {ww : w  *}
is not regular

Proof: Use the Pumping Lemma


R
L  {ww : w  *}

Assume for contradiction


that L is a regular language

Since L is infinite
we can apply the Pumping Lemma
R
L  {ww : w  *}

Let m be the integer in the Pumping Lemma

Pick a string w such that: w  L and

length | w| m

m m m m
We pick wa b b a
m m m m
Write a b b a xyz

From the Pumping Lemma


it must be that length | x y |  m, | y | 1

m m m m
xyz  a...aa...a...ab...bb...ba...a
x y z
k
Thus: y  a , k 1
m m m m k
x y za b b a y  a , k 1

i
From the Pumping Lemma: xy z  L
i  0, 1, 2, ...

2
Thus: xy z  L
m m m m k
x y za b b a y  a , k 1

2
From the Pumping Lemma: xy z  L

m+k m m m
2
xy z = a...aa...aa...a...ab...bb...ba...a ∈ L
x y y z

m k m m m
Thus: a b b a L
m k m m m
a b b a L k 1

R
BUT: L  {ww : w  *}

m k m m m
a b b a L

CONTRADICTION!!!
Therefore: Our assumption that L
is a regular language is not true

Conclusion: L is not a regular language


Non-regular languages
n l n l
L  {a b c : n, l  0}

Regular languages
Theorem: The language
n l n l
L  {a b c : n, l  0}
is not regular

Proof: Use the Pumping Lemma


n l n l
L  {a b c : n, l  0}

Assume for contradiction


that L is a regular language

Since L is infinite
we can apply the Pumping Lemma
n l n l
L  {a b c : n, l  0}

Let m be the integer in the Pumping Lemma

Pick a string w such that: w  L and

length | w| m

m m 2m
We pick wa b c
m m 2m
Write a b c xyz

From the Pumping Lemma


it must be that length | x y |  m, | y | 1

m m 2m
xyz  a...aa...aa...ab...bc...cc...c
x y z
k
Thus: y  a , k 1
m m 2m k
x y za b c y  a , k 1

i
From the Pumping Lemma: xy z  L
i  0, 1, 2, ...

Thus:
x y z  xz  L
0
m m 2m k
x y za b c y  a , k 1

From the Pumping Lemma: xz  L


mk m 2m
xz  a...aa...ab...bc...cc...c  L
x z

mk m 2m
Thus: a b c L
mk m 2m
a b c L k 1

n l n l
BUT: L  {a b c : n, l  0}

mk m 2m
a b c L

CONTRADICTION!!!
Therefore: Our assumption that L
is a regular language is not true

Conclusion: L is not a regular language


n!
Non-regular languages L  {a : n  0}

Regular languages
Example
L ={ w | w =0n where n is a perfect square}

Let us assume that L is regular.


Use pumping Lemma
Let m be the corresponding no. for pumping
lemma.
w = 0square of m
|w| >= m
w = x y z where | x y| <= m; |y | >=1
Consider
x yy z it should be in L
|xyyz| must be a perfect square
but
|xyyz| = m2 + |y| <= m2 + m
< m2 + m +m+1
= (m+1)2
m2 < |xyyz |< (m+1)2
so length of the string x yyz can not be a
perfect square.
So it can not bea member of L. A
contradiction! Hence L is non regular
Theorem: The language n!
L  {a : n  0}
is not regular

n!  1  2  (n  1)  n

Proof: Use the Pumping Lemma


n!
L  {a : n  0}

Assume for contradiction


that L is a regular language

Since L is infinite
we can apply the Pumping Lemma
n!
L  {a : n  0}

Let m be the integer in the Pumping Lemma

Pick a string w such that: w  L


length | w| m

m!
We pick wa
m!
Write a xyz

From the Pumping Lemma


it must be that length | x y |  m, | y | 1
m m! m
m!
xyz  a  a...aa...aa...aa...aa...a
x y z
k
Thus: y  a , 1 k  m
m! k
x y za y  a , 1 k  m

i
From the Pumping Lemma: xy z  L
i  0, 1, 2, ...

2
Thus: xy z  L
m! k
x y za y  a , 1 k  m

2
From the Pumping Lemma: xy z  L

mk m!m
2
xy z  a...aa...aa...aa...aa...aa...a  L
x y y z

m! k
Thus: a L
m! k
a L 1 k  m

n!
Since: L  {a : n  0}

There must exist p such that:


m! k  p!
However: m! k  m! m for m 1
 m! m!
 m!m  m!
 m!(m  1)
 (m  1)!

m! k  (m  1)!

m! k  p! for any p


m! k 1 k  m
a L

n!
BUT: L  {a : n  0}

m! k
a L

CONTRADICTION!!!
Therefore: Our assumption that L
is a regular language is not true

Conclusion: L is not a regular language


Minimization of Finite Automata
• State minimization problem

• There can be more than one DFA that


accepts the same language
Among these equivalent DFAs, it is often
useful to find the smallest, that is, the DFA
with the minimum possible number of states

This is especially important when DFAs are


used for designing computer hardware
circuits
a,b

b
1
3
a
a
0

b 4 b 5
2
b
a a
a,b
Equivalent finite automata

a,b b
0 1
2

a
a ,b
Unreachable states

A DFA sometimes contains states that


cannot possibly be reached from the initial
state
These can easily be identified and they
can be removed without changing the
language accepted by the DFA
Two states p and q of a dfa are
indistinguishable if
* (p, w)  F implies * (q,w )  F
And
*(p, w)  F implies * (q,w)  F
for all w * .
If on the other hand there exists some
string w such that
*(p,w) F and *(q,w) F
Or vice versa, then p and q are said to be
distinguishable by a string w.
Procedure: mark
1- Remove all inaccessible states
2- Consider all pairs of states (p, q) .If
p  F and q F or vice versa mark the pair
(p, q) as distinguishable.
3-Repeat the following step until no previously
unmarked pairs are marked.
For all pairs (p, q) and all a   compute
(p,a)=pa and  (q,a) = qa .If (pa, qq) are
distinguishable , mark (p, q) distinguishable.
Procedure Reduce
1- Use procedure mark to find all pairs of
distinguishable states.Then from this,
find the sets of all indistinguishable
states , say{qi,qj…qk},{ql, qm …qn} etc,
2-For each set {qi,qj…qk} of such
indistinguishable states, create a state
labeled ij..k for the new machine M’
3-For each transition rule of M of the form
 (qr, a) = qp find the sets to which qr and
qp belong.
If qp {ql, qm,…qn}
And qr (qi, qj,…qk} add to ’ a rule

’ (ij…k, a) = lm…n
4- The initial state q0’ is that state of M’
whose label includes the q0.initial state
of M

5- F’ is the set of all states whose label


contain I such that qi F
Example

1 0,1
0
1
0 0
0 4
2 1

1 0

1
3
Equivalent FA

0 0,1

0,1
0 123 1
4
0
1

0
0 1 1
1
2

0
0
XX0,1

1
0,1
XX0
Example
minimize
4

0 1
0 0,1
0 1 2

0 3
1
0,1
5

0,1
Minimal state FA
0,1
0

01 35
1
Example 0

4
0
2
0 1 0
1
1 5

1 1
3 0 0
6
1
1 0
7
1
Min state FA

1
0

1 0
124 357
6

0
L1= L3 = 1

L5 = * 01
L2= 0

* L7 = * 11
L4 =  00

L6 = l = *10
Lex
Lex: a lexical analyzer
• A Lex program recognizes strings

• For each kind of string found


the lex program takes an action
Output
Identifier: Var
Input Operand: =
Var = 12 + 9; Integer: 12
if (test > 20) Operand: +
temp = 0; Integer: 9
Lex Semicolumn: ;
else
program
while (a < 20) Keyword: if
temp++; Parenthesis: (
Identifier: test
....
In Lex strings are described
with regular expressions

Lex program
Regular expressions

“+”
“-” /* operators */
“=“

“if”
/* keywords */
“then”
Lex program
Regular expressions

(0|1|2|3|4|5|6|7|8|9)+ /* integers */

(a|b|..|z|A|B|...|Z)+ /* identifiers */
Internal Structure of Lex

Lex
Regular Minimal
NFA DFA
expressions DFA

The final states of the DFA are


associated with actions
Lex matches the longest input string

Example: Regular Expressions “if”


“ifend”

Input: ifend if ifn

Matches: “ifend” “if” nomatch


Input Output

Integer
1234 test Identifier
var 566 78 Identifier
9800 + Integer
Integer
temp
Integer
Error in line: 3
Identifier

You might also like