You are on page 1of 74

Pumping Lemma

for
Context-free Languages

Fall 2005 Costas Busch - RPI 1


Take an infinite context-free language

Generates an infinite number


of different strings

Example: S  ABE | bBd


A  Aa | a
B  bSD | cc
D  Dd | d
E  eE | e
Fall 2005 Costas Busch - RPI 2
In a derivation of a “long” enough
string, variables are repeated

A possible derivation:

S  ABE  AaBE  aaBE


 aabSDE  aabbBdDE 
 aaabbccdDE  aabbccddE
 aabbccddeE  aabbccddee
Fall 2005 Costas Busch - RPI 3
Derivation Tree aabbccddee
S

A B E

e E
A a b S D

a e
b B d d

c c
Repeated
variable
Fall 2005 Costas Busch - RPI 4
Derivation Tree aabbccddee
S

A B E

e E
A a b S D

a e
b B d d

c c
Repeated
variable
Fall 2005 Costas Busch - RPI 5
B  bSD  bbBdD  bbBdd
B

b S D

b B d d

*
B  bbBdd
Fall 2005 Costas Busch - RPI 6
S  ABE  AaBE  aaBE  aaBeE  aaBee

A B E

e E
A a
a e
*
S  aaBee
Fall 2005 Costas Busch - RPI 7
B

c c

B  cc

Fall 2005 Costas Busch - RPI 8


Putting all together
S

A B E

e E
A a b S D

a e
b B d d

c c

* *
S  aaBee B  bbBdd B  cc
Fall 2005 Costas Busch - RPI 9
* *
S  aaBee B  bbBdd B  cc

* *
0 0
S  aaBee  aaccee  aa(bb) cc(dd ) ee

0 0
aa(bb) cc(dd ) ee  L(G )
Fall 2005 Costas Busch - RPI 10
We have removed the middle part
S

A B E

c e E
A a c

a e

*
0 0
S  aa(bb) cc(dd ) ee

Fall 2005 Costas Busch - RPI 11


* *
S  aaBee B  bbBdd B  cc

* *
S  aaBee  aabbBddee
* *
2 2 2 2
 aa (bb) B (dd ) ee  aa (bb) cc(dd ) ee

2 2
aa(bb) cc(dd ) ee  L(G )
Fall 2005 Costas Busch - RPI 12
We have repeated middle part two times
S

A B E

e E
A a b S D

a 1 b B d d
e

b S D

2 b B d d

c c

*
2 2
S  aa(bb) cc(dd ) ee
Fall 2005 Costas Busch - RPI 13
* *
S  aaBee B  bbBdd B  cc

*
3 3
S  aa(bb) cc(dd ) ee  L(G )

Fall 2005 Costas Busch - RPI 14


We have repeated middle part three times
S

A B E

e E
A a b S D

a 1 b B d d
e

b S D

2 b B d d

b S D

3 b B d d

c c

*
3 3
S  aa (bb) cc(dd ) ee
Fall 2005 Costas Busch - RPI 15
In General:

* *
S  aaBee B  bbBdd B  cc

*
i i
S  aa(bb) cc(dd ) ee  L(G )

For any i0


Fall 2005 Costas Busch - RPI 16
Repeat middle part i times
S

A B E

e E
A a b S D

a 1 b B d d
e

b S D

i b B d d

c c
*
i i
S  aa (bb) cc(dd ) ee
Fall 2005 Costas Busch - RPI 17
From Grammar
And string
S  ABE | bBd
A  Aa | a aabbccddee  L(G )
B  bSD | cc
D  Dd | d
E  eE | e

We inferred that a family of strings is in L(G )


*
i i
S  aa(bb) cc(dd ) ee  L(G ) for any i0
Fall 2005 Costas Busch - RPI 18
Consider now an Arbitrary Grammar

Consider now an arbitrary infinite


context-free language L

Let G be the grammar of L  {}

Take G so that it has no unit-productions


and no  -productions

(remove them)
Fall 2005 Costas Busch - RPI 19
Let r be the number of variables

Let t be the maximum size


of the right-hand side
of any production

Example: S  ABE | bBd r 5


A  Aa | a
B  bSD | cc t 3
D  Dd | d
E  eE | e
Fall 2005 Costas Busch - RPI 20
r
Let m  t 1
Take a string w L(G )
with length | w | m

Claim: in the derivation tree of w


there is a path from the root to a leaf
where a variable of G is repeated

Fall 2005 Costas Busch - RPI 21


Derivation tree of w
| w | m S

We will show:
Some variable
H
Is repeated


Fall 2005 Costas Busch - RPI 22
Proof of Claim:
We will show that the tree of w
Has at least one path with r  2 nodes

Suppose the opposite:


At most
r 1
Levels

Fall 2005 Costas Busch - RPI 23


Maximum number of nodes per level
Level 0: 1 nodes
Level 1: t nodes

t nodes

Fall 2005 Costas Busch - RPI 24


Maximum number of nodes per level
Level 0: 1 nodes
Level 1: t nodes

2
Level 2: t nodes

t nodes t nodes

2 nodes
t

Fall 2005 Costas Busch - RPI 25


Maximum number of nodes per level
Level 0: 1 nodes

At most
r 1 i : t i nodes
Level
Levels

Level r : t r nodes

Maximum possible string length =


r
= max nodes at level r = t
Fall 2005 Costas Busch - RPI 26
Therefore,
r
Maximum length of string w: | w | t

r
However we took, | w | m  t  1

Contradiction!!!

Fall 2005 Costas Busch - RPI 27


Thus, there is a path from the root
to a leaf with at least r  2 nodes
V1 V1  S
At least V2
r2
Levels V3 r  1 Variables

Vr 1

 symbol
Fall 2005 Costas Busch - RPI 28
Since there are at most r different variables,
some variable is repeated
V1 S

V2
H
V3 Pigeonhole
principle
H
Vr 1

 
END OF PROOF
Fall 2005 Costas Busch - RPI 29
Take now a string w with | w | m
S

H
Some variable H
is repeated
H


subtree

Take H to be deep, so that every path


in the subtree has unique variables
Fall 2005 Costas Busch - RPI 30
S
w  uvxyz

yield u z yield
H

yield v y yield
H

u , v, x, y , z :
Strings of terminals x yield
Fall 2005 Costas Busch - RPI 31
Example: S

A B E

e E
A a b S D
e
u  aa a b B d d
u v y z
v  bb c c
x  cc x
y  dd B corresponds to H
z  ee
Fall 2005 Costas Busch - RPI 32
Possible derivations S

S  uHz
u z
H

H  vHy
v y
H

Hx
x
Fall 2005 Costas Busch - RPI 33
Example: u  aa
S
v  bb
A B E
x  cc
e E
A a b S D y  dd
e
a b B d d z  ee
c c B corresponds to H
  
S  uHz H  vHy H x
* *
S  aaBee B  bbBdd B  cc
Fall 2005 Costas Busch - RPI 34
  
S  uHz H  vHy Hx

 *
0 0
S  uHz  uxz  uv xy z  L(G )

Fall 2005 Costas Busch - RPI 35


  
S  uHz H  vHy Hx

 * *
1 1
S  uAz  uvAyz  uvxyz  uv xy z  L(G )

The original w  uvxyz


Fall 2005 Costas Busch - RPI 36
  
S  uHz H  vHy Hx

 * *
S  uHz  uvHyz  uvvHyyz
*
2 2
 uvvxyyz  uv xy z  L(G )

Fall 2005 Costas Busch - RPI 37


  
S  uHz H  vHy Hx

 * *
S  uHz  uvHyz  uvvHyyz 
* *
3 3
 uvvvHyyyz  uvvvxyyyz  uv xy z  L(G )

Fall 2005 Costas Busch - RPI 38


  
S  uHz H  vHy Hx

S* uHz * uvHyz * uvvHyyz *


*

* uvvvvHy  yyyz 
 *
* i i
 uvvvvxy  yyyz  uv xy z  L(G)

Fall 2005 Costas Busch - RPI 39


Therefore,

If we know that w  uvxyz  L(G )

i i
we also know that uv xy z  L(G )
For all i  0

L(G )  L  {}

i i
uv xy z  L
Fall 2005 Costas Busch - RPI 40
Observations: S

| vy |  1
Since there are u z
H
no unit or
 -productions
v y
H
subtree
| vxy |  m
x
Since no variable is
repeated in any path
in subtree
Fall 2005 Costas Busch - RPI 41
The Pumping Lemma:
For any infinite context-free language L
there exists an integer m such that

for any string w  L, | w | m


we can write w  uvxyz
with lengths | vxy | m and | vy | 1
and it must be that:
i i
uv xy z  L, for all i  0
Fall 2005 Costas Busch - RPI 42
Applications
of
The Pumping Lemma

Fall 2005 Costas Busch - RPI 43


Non-context free languages
n n n
{a b c : n  0}

Context-free languages

n n
{a b : n  0}

Fall 2005 Costas Busch - RPI 44


Theorem: The language
n n n
L  {a b c : n  0}
is not context free

Proof: Use the Pumping Lemma


for context-free languages

Fall 2005 Costas Busch - RPI 45


n n n
L  {a b c : n  0}

Assume for contradiction that L


is context-free

Since L is context-free and infinite


we can apply the pumping lemma

Fall 2005 Costas Busch - RPI 46


n n n
L  {a b c : n  0}

Pumping Lemma gives a magic number m


such that:

Pick any string w L with length | w | m

m m m
We pick: wa b c

Fall 2005 Costas Busch - RPI 47


n n n
L  {a b c : n  0}

m m m
wa b c

We can write: w  uvxyz

with lengths | vxy | m and | vy | 1


Fall 2005 Costas Busch - RPI 48
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Pumping Lemma says:

i i
uv xy z  L for all i0
Fall 2005 Costas Busch - RPI 49
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

We examine all the possible locations


of string vxy in w

Fall 2005 Costas Busch - RPI 50


n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

m
Case 1: vxy is within a

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 51
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 1: v and y consist from only a

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 52
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 1: Repeating v and y


k 1
mk m m
aaaaaa...aaaaaa bbb...bbb ccc...ccc
u 2 2 z
v xy
Fall 2005 Costas Busch - RPI 53
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

2 2
Case 1: From Pumping Lemma: uv xy z  L
k 1
mk m m
aaaaaa...aaaaaa bbb...bbb ccc...ccc
u 2 2 z
v xy
Fall 2005 Costas Busch - RPI 54
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

2 2
Case 1: From Pumping Lemma: uv xy z  L
k 1
2 2 m k m m
However: uv xy z  a b c L
Contradiction!!!
Fall 2005 Costas Busch - RPI 55
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

m
Case 2: vxy is within b

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 56
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 2: Similar analysis with case 1

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 57
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

m
Case 3: vxy is within c

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 58
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 3: Similar analysis with case 1

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 59
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

m m
Case 4: vxy overlaps a and b

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 60
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 4: Possibility 1: v contains only a


y contains only b
m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 61
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 4: Possibility 1: v contains only a


k1  k 2  1 y contains only b
m  k1 m  k2 m
aaa...aaaaaaa bbbbbbb...bbb ccc...ccc
u 2 2 z
v xy
Fall 2005 Costas Busch - RPI 62
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

2 2
Case 4: From Pumping Lemma: uv xy z  L
k1  k 2  1
m  k1 m  k2 m
aaa...aaaaaaa bbbbbbb...bbb ccc...ccc
u 2 2 z
v xy
Fall 2005 Costas Busch - RPI 63
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

2 2
Case 4: From Pumping Lemma: uv xy z  L
k1  k 2  1

2 2 m  k1 m k2 m
However: uv xy z  a b c L
Contradiction!!!
Fall 2005 Costas Busch - RPI 64
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 4: Possibility 2: v contains a and b


y contains only b
m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 65
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 4: Possibility 2: v contains a and b


k1  k 2  k  1 y contains only b
m k1 k 2 mk m
aaa...aaaaabbaabb bbbbbbb...bbb ccc...ccc
u 2
v xy 2 z
Fall 2005 Costas Busch - RPI 66
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

2 2
Case 4: From Pumping Lemma: uv xy z  L
k1  k 2  k  1
m k1 k 2 mk m
aaa...aaaaabbaabb bbbbbbb...bbb ccc...ccc
u 2
v xy 2 z
Fall 2005 Costas Busch - RPI 67
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

2 2
Case 4: From Pumping Lemma: uv xy z  L

However: k1  k 2  k  1
2 2 m k1 k 2 m  k m
uv xy z a b a b c L
Contradiction!!!
Fall 2005 Costas Busch - RPI 68
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 4: Possibility 3: v contains only a


y contains a and b
m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 69
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 4: Possibility 3: v contains only a


y contains a and b

Similar analysis with Possibility 2

Fall 2005 Costas Busch - RPI 70


n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

m m
Case 5: vxy overlaps b and c

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 71
n n n
L  {a b c : n  0}
m m m
wa b c
w  uvxyz | vxy | m | vy | 1

Case 5: Similar analysis with case 4

m m m
aaa...aaa bbb...bbb ccc...ccc
u vxy z
Fall 2005 Costas Busch - RPI 72
There are no other cases to consider

(since | vxy | m , string vxy cannot


m m m
overlap a , b and c at the same time)

Fall 2005 Costas Busch - RPI 73


In all cases we obtained a contradiction

Therefore: The original assumption that


n n n
L  {a b c : n  0}
is context-free must be wrong

Conclusion: L is not context-free


Fall 2005 Costas Busch - RPI 74

You might also like