You are on page 1of 75

CS 301

Lecture 10 – Chomsky Normal Form

1 / 23
More CFLs
i j k
• A = {a b c ∣ i ≤ j or i = k}
• B = {w ∣ w ∈ {a, b, c} contains the same number of as as bs and cs combined}

m n m+n
• C = {1 +1 =1 ∣ m, n ≥ 1}; Σ = {1, +, =}
• D = (abb ∣ bbaa)
∗ ∗

R
• E = {w ∣ w ∈ {0, 1} and w

is a binary number not divisible by 5}

2 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G
• q0 is the start variable, and

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G
• q0 is the start variable, and
• transitions δ(q, t) = r become rules q → tr

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G
• q0 is the start variable, and
• transitions δ(q, t) = r become rules q → tr

If on input w = w1 w2 ⋯wn , M goes through states r0 , r1 , . . . , rn , then

r0 ⇒ w1 r1 ⇒ w1 w2 r2 ⇒ ⋯ ⇒ w1 w2 ⋯wn rn

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G
• q0 is the start variable, and
• transitions δ(q, t) = r become rules q → tr

If on input w = w1 w2 ⋯wn , M goes through states r0 , r1 , . . . , rn , then

r0 ⇒ w1 r1 ⇒ w1 w2 r2 ⇒ ⋯ ⇒ w1 w2 ⋯wn rn

So G has derived the string wrn but this still has a variable

What additional rules should we add to end up with a string of terminals?


For each state q ∈ F , add a rule q → ε

3 / 23
Formally
Proof.
Given a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG
G = (V, Σ, R, S) where

V =Q
S = q0
R = {q → tr ∶ δ(q, t) = r} ∪ {q → ε ∶ q ∈ F }

4 / 23
Formally
Proof.
Given a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG
G = (V, Σ, R, S) where

V =Q
S = q0
R = {q → tr ∶ δ(q, t) = r} ∪ {q → ε ∶ q ∈ F }

If r0 , r1 , . . . , rn is the computation of M on input w = w1 w2 ⋯wn , then r0 = q0 and


δ(ri−1 , wi ) = ri for 1 ≤ i ≤ n


By construction r0 ⇒ w1 r1 ⇒ w1 w2 r2 ⇒ w1 w2 ⋯wn rn

Therefore, w ∈ L(M ) iff rn ∈ F iff rn ⇒ ε iff q0 ⇒ w iff w ∈ L(G)

4 / 23
Returning to our language

R
E = {w ∣ w ∈ {0,1} and w

is a binary number not divisible by 5}

0 Q1
1 0

Q0 1 Q2
0
1
1

Q4 Q3
1 0

5 / 23
Returning to our language

R
E = {w ∣ w ∈ {0,1} and w

is a binary number not divisible by 5}

Q1
0
1 0 Q0 → 0Q0 ∣ 1Q2
Q1 → 0Q3 ∣ 1Q0 ∣ ε
Q0 1 Q2 Q2 → 0Q1 ∣ 1Q3 ∣ ε
0 Q3 → 0Q4 ∣ 1Q1 ∣ ε
1
Q4 → 0Q2 ∣ 1Q4 ∣ ε
1

Q4 Q3
1 0

5 / 23
Chomsky Normal Form (CNF)
A CFG G = (V, Σ, R, S) is in Chomsky Normal Form if all rules have one of these
forms
• S→ε where S is the start variable
• A → BC where A ∈ V and B, C ∈ V ∖ {S}
• A→t where A ∈ V and t ∈ Σ

Note
• The only rule with ε on the right has the start variable on the left
• The start variable doesn’t appear on the right hand side of any rule

6 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S
T → AU ∣ BV ∣ a ∣ b
U → TA
V → TB
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b
U → TA
V → TB
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA
V → TB
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b ⇒ baT AB

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b ⇒ baT AB
⇒ baaAB

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b ⇒ baT AB
⇒ baaAB
⇒ baaaB

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b ⇒ baT AB
⇒ baaAB
⇒ baaaB
⇒ baaab

7 / 23
Converting to CNF
Theorem
Every context-free language A is generated by some CFG in CNF.

Proof.
Given a CFG G = (V, Σ, R, S) generating A, we construct a new CFG
G = (V , Σ, R , S ) in CNF generating A.
′ ′ ′ ′

There are five steps.


START Add a new start variable
BIN Replace rules with RHS longer than two with multiple rules each of
which has a RHS of length two
DEL-ε Remove all ε-rules (A → ε)
UNIT Remove all unit-rules (A → B)
TERM Add a variable and rule for each terminal (T → t) and replace terminals
on the RHS of rules

8 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it
• B → xA or B → Ax. Add rule B → x

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it
• B → xA or B → Ax. Add rule B → x
UNIT For each rule A → B, remove it and add rules A → u for each B → u
unless A → u is a unit rule already removed

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it
• B → xA or B → Ax. Add rule B → x
UNIT For each rule A → B, remove it and add rules A → u for each B → u
unless A → u is a unit rule already removed
TERM For each t ∈ Σ, add a new variable T and a rule T → t; replace each t in
the RHS of nonunit rules with T

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it
• B → xA or B → Ax. Add rule B → x
UNIT For each rule A → B, remove it and add rules A → u for each B → u
unless A → u is a unit rule already removed
TERM For each t ∈ Σ, add a new variable T and a rule T → t; replace each t in
the RHS of nonunit rules with T
Each of the five steps preserves the language generated by the grammar so
L(G ) = A.

9 / 23
Example
Convert to CNF
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:

10 / 23
Example
Convert to CNF
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:
S→A
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

10 / 23
Example
Convert to CNF
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:
S→A
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

BIN: Replace A → BAB:

10 / 23
Example
Convert to CNF
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:
S→A
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

BIN: Replace A → BAB:


S→A
A → BA1 ∣ B ∣ ε
B → 00 ∣ ε
A1 → AB

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε:
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:
S→A
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

BIN: Replace A → BAB:


S→A
A → BA1 ∣ B ∣ ε
B → 00 ∣ ε
A1 → AB

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε:
A → BAB ∣ B ∣ ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B
B → 00 ∣ ε
START:
S→A A1 → AB ∣ B
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

BIN: Replace A → BAB:


S→A
A → BA1 ∣ B ∣ ε
B → 00 ∣ ε
A1 → AB

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε:
A → BAB ∣ B ∣ ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B
B → 00 ∣ ε
START:
S→A A1 → AB ∣ B
A → BAB ∣ B ∣ ε Remove B → ε:
B → 00 ∣ ε

BIN: Replace A → BAB:


S→A
A → BA1 ∣ B ∣ ε
B → 00 ∣ ε
A1 → AB

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε:
A → BAB ∣ B ∣ ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B
B → 00 ∣ ε
START:
S→A A1 → AB ∣ B
A → BAB ∣ B ∣ ε Remove B → ε:
B → 00 ∣ ε S→A∣ε
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB:
S→A B → 00
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
B → 00 ∣ ε Don’t add A → ε because we
A1 → AB already removed it

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε: Remove A1 → ε:
A → BAB ∣ B ∣ ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B
B → 00 ∣ ε
START:
S→A A1 → AB ∣ B
A → BAB ∣ B ∣ ε Remove B → ε:
B → 00 ∣ ε S→A∣ε
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB:
S→A B → 00
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
B → 00 ∣ ε Don’t add A → ε because we
A1 → AB already removed it

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε: Remove A1 → ε:
A → BAB ∣ B ∣ ε S→A∣ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B A → BA1 ∣ B ∣ A1
B → 00 ∣ ε B → 00
START:
S→A A1 → AB ∣ B A1 → AB ∣ B ∣ A
A → BAB ∣ B ∣ ε Remove B → ε: Don’t add A → ε because we
B → 00 ∣ ε S→A∣ε already removed it
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB:
S→A B → 00
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
B → 00 ∣ ε Don’t add A → ε because we
A1 → AB already removed it

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε: Remove A1 → ε:
A → BAB ∣ B ∣ ε S→A∣ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B A → BA1 ∣ B ∣ A1
B → 00 ∣ ε B → 00
START:
S→A A1 → AB ∣ B A1 → AB ∣ B ∣ A
A → BAB ∣ B ∣ ε Remove B → ε: Don’t add A → ε because we
B → 00 ∣ ε S→A∣ε already removed it
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB: UNIT: Remove S → A
S→A B → 00
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
B → 00 ∣ ε Don’t add A → ε because we
A1 → AB already removed it

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε: Remove A1 → ε:
A → BAB ∣ B ∣ ε S→A∣ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B A → BA1 ∣ B ∣ A1
B → 00 ∣ ε B → 00
START:
S→A A1 → AB ∣ B A1 → AB ∣ B ∣ A
A → BAB ∣ B ∣ ε Remove B → ε: Don’t add A → ε because we
B → 00 ∣ ε S→A∣ε already removed it
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB: UNIT: Remove S → A
B → 00
S→A S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
A → BA1 ∣ B ∣ A1
B → 00 ∣ ε Don’t add A → ε because we B → 00
A1 → AB already removed it A1 → AB ∣ B ∣ A

10 / 23
Example continued
From previous slide
S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide
S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

Remove S → B

11 / 23
Example continued
From previous slide
S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

Remove S → B
S → BA1 ∣ A1 ∣ ε ∣ 00
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1
S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

Remove S → B
S → BA1 ∣ A1 ∣ ε ∣ 00
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1
B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → A


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1
B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → A


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them
A → BA1 ∣ B ∣ A1
Remove A → B
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1
B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → A


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them
A → BA1 ∣ B ∣ A1
Remove A → B
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1 Remove A → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1
B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → A


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them
A → BA1 ∣ B ∣ A1
Remove A → B
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1 Remove A → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1 A → BA1 ∣ 00 ∣ AB
B → 00 B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → ADon’t add A → B because


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them we removed it
A → BA1 ∣ B ∣ A1 Don’t add A → A because
Remove A → B it’s useless
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1 Remove A → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1 A → BA1 ∣ 00 ∣ AB
B → 00 B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → ADon’t add A → B because


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them we removed it
A → BA1 ∣ B ∣ A1 Don’t add A → A because
Remove A → B it’s useless
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00 Remove A1 → B
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1 Remove A → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1 A → BA1 ∣ 00 ∣ AB
B → 00 B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → ADon’t add A → B because


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them we removed it
A → BA1 ∣ B ∣ A1 Don’t add A → A because
Remove A → B it’s useless
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00 Remove A1 → B
S → BA1 ∣ ε ∣ 00 ∣ AB
B → 00
A → BA1 ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
B → 00
A1 → AB ∣ A ∣ 00
11 / 23
Example continued
Copied from the previous slide
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ A ∣ 00

Remove A1 → A

12 / 23
Example continued
Copied from the previous slide
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ A ∣ 00

Remove A1 → A
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ 00 ∣ BA1

12 / 23
Example continued
Copied from the previous slide TERM: Add Z → 0
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ A ∣ 00

Remove A1 → A
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ 00 ∣ BA1

12 / 23
Example continued
Copied from the previous slide TERM: Add Z → 0
S → BA1 ∣ ε ∣ 00 ∣ AB S → BA1 ∣ ε ∣ ZZ ∣ AB
A → BA1 ∣ 00 ∣ AB A → BA1 ∣ ZZ ∣ AB
B → 00 B → ZZ
A1 → AB ∣ A ∣ 00 A1 → AB ∣ ZZ ∣ BA1
Z→0
Remove A1 → A
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ 00 ∣ BA1

12 / 23
Caution
Sipser gives a different procedure
1 START
2 DEL-ε
3 UNIT
4 BIN
5 TERM
This procedure works but can lead to an exponential blow up in the number of rules!

2∣G∣
In general, if DEL-ε comes before BIN, then ∣G ∣ is O(2 );

2
if BIN comes before DEL-ε, then ∣G ∣ is O(∣G∣ )

UNIT is responsible for the quadratic blow up

So use whichever procedure you’d like, but Sipser’s can be very bad
(Sipser’s is bad if you have long rules with lots of variables with ε-rules)

13 / 23
Example blow up

A → BCDEEDCB ∣ CBEDDEBC
B→0∣ε
C→1∣ε
D→2∣ε
E→3∣ε

has five variables and 10 rules

Converting using START, BIN, DEL-ε, UNIT, TERM gives a CFG with 18 variables
and 125 rules

14 / 23
Example blow up

A → BCDEEDCB ∣ CBEDDEBC
B→0∣ε
C→1∣ε
D→2∣ε
E→3∣ε

has five variables and 10 rules

Converting using START, BIN, DEL-ε, UNIT, TERM gives a CFG with 18 variables
and 125 rules

Converting using START, DEL-ε, UNIT, BIN, TERM gives a CFG with 1394 variables
and 1953 rules

14 / 23
Prefix
Recall Prefix(L) = {w ∣ for some x ∈ Σ , wx ∈ L}

Theorem
The class of context-free languages is closed under Prefix.

15 / 23
Prefix
Recall Prefix(L) = {w ∣ for some x ∈ Σ , wx ∈ L}

Theorem
The class of context-free languages is closed under Prefix.

Proof idea
R
Consider the language {w#w ∣ w ∈ {a, b} } generated by

T → aT a ∣ bT b ∣ #

15 / 23
Prefix
Recall Prefix(L) = {w ∣ for some x ∈ Σ , wx ∈ L}

Theorem
The class of context-free languages is closed under Prefix.

Proof idea
R
Consider the language {w#w ∣ w ∈ {a, b} } generated by

T → aT a ∣ bT b ∣ #
Let’s convert to CNF

15 / 23
Prefix
Recall Prefix(L) = {w ∣ for some x ∈ Σ , wx ∈ L}

Theorem
The class of context-free languages is closed under Prefix.

Proof idea
R
Consider the language {w#w ∣ w ∈ {a, b} } generated by

T → aT a ∣ bT b ∣ #
Let’s convert to CNF
S → AU ∣ BV ∣ #
T → AU ∣ BV ∣ #
U → TA
V → TB
A→a
B→b

15 / 23
Derivation of ab#ba

S
S ⇒ AU
⇒ aU
⇒ aT A A U
⇒ aBV A
⇒ abV A a T A
⇒ abT BA
⇒ ab#BA
⇒ ab#bA B V a
⇒ ab#ba

The prefix ab# includes b T B


– all terminals from subtrees with a blue root;
– some terminals from subtrees with a violet root;
– no terminals from subtrees with a red root # b
16 / 23
Desired derivation for the prefix
We would like a derivation like this S
S ⇒ AU
⇒ aU
⇒ aT A A U
⇒ aBV A
⇒ abV A
a T A
⇒ abT BA
⇒ ab#BA
⇒ ab#εA B V ε
⇒ ab#εε

Everything left of the violet path is produced b T B


Everything right of the violet path becomes ε
The leaf connected to the violet path is produced
# ε
17 / 23
The proof idea
The violet path corresponds to the point where we “split” the prefix from the
remainder of the string

We want to construct a CFG that keeps track of whether a given variable in the
derivation is
L left of the split,
S part of the split, or
R right of the split
We can construct a new CFG whose variables are ⟨A, L⟩, ⟨A, S⟩, or ⟨A, R⟩ where A is
a variable in the original CFG

We have to deal with the three types of rules


• S→ε
• A → BC
• A→t
and produce new rules corresponding to the variable on the LHS being left of, right of,
or on the split
18 / 23
Proof
If L = ∅, then Prefix(L) = ∅ which is CF.

Otherwise, let L be CF and generated by the CFG G = (V, Σ, R, S) in CNF.

Construct a new CFG (not in CNF) G = (V , Σ, R , S ) where


′ ′ ′ ′

V = {⟨A, D⟩ ∣ A ∈ V and D ∈ {L, S, R}}


S = ⟨S, S⟩

′ ′
Now we just need to specify R . We’ll start with R = ∅ and add rules to it

19 / 23
Proof continued
Since L is nonempty, ε ∈ Prefix(L) so add the rule ⟨S, S⟩ → ε to R


For each rule of the form A → BC in R, add the following rules to R
⟨A, L⟩ → ⟨B, L⟩⟨C, L⟩ left of the split
⟨A, S⟩ → ⟨B, L⟩⟨C, S⟩ ∣ ⟨B, S⟩⟨C, R⟩ one of B or C is on the split
⟨A, R⟩ → ⟨B, R⟩⟨C, R⟩ right of the split

For each rule of the form A → t in R, add the following rules to R
⟨A, L⟩ → t
⟨A, S⟩ → t
⟨A, R⟩ → ε

20 / 23
Proof continued

For each w = w1 w2 ⋯wn ∈ L, S ⇒ A1 A2 ⋯An where Ai ⇒ wi
By construction,

⟨S, S⟩ ⇒ ⟨A1 , L⟩⋯⟨Ai−1 , L⟩⟨Ai , S⟩⟨Ai+1 , R⟩⋯⟨An , R⟩

⇒ w1 w2 ⋯wi

for each 1 ≤ i ≤ n

I.e., G derives the prefix of every string in L

A similar argument works to show that if G derives a string then it’s a prefix of some
string in L

21 / 23
Applying the construction
Deriving ab# ⟨S, S⟩
⟨S, S⟩ ⇒ ⟨A, L⟩⟨U, S⟩
⇒ a⟨U, S⟩
⇒ a⟨T, S⟩⟨A, R⟩ ⟨A, L⟩ ⟨U, S⟩
⇒ a⟨B, L⟩⟨V, S⟩⟨A, R⟩
⇒ ab⟨V, S⟩⟨A, R⟩
a ⟨T, S⟩ ⟨A, R⟩
⇒ ab⟨T, S⟩⟨BA, R⟩
⇒ ab#⟨B, R⟩⟨A, R⟩
⇒ ab#⟨A, R⟩ ⟨B, L⟩ ⟨V, S⟩ ε
⇒ ab#

b ⟨T, S⟩⟨B, R⟩

# ε
22 / 23
Similarities with regular expression
Proving things about
• Regular languages. Assume there exists a regular expression that generates the
language and consider the six cases
• Context-free languages. Assume there exists a CFG that generates the language
and consider the three types of rules

23 / 23

You might also like