You are on page 1of 27

Chomsky and Greibach Normal

Forms
Teodor Rus
rus@cs.uiowa.edu

The University of Iowa, Department of Computer Science

Computation Theory p.1/27

Simplifying a CFG

It is often convenient to simplify a CFG;


One of the simplest and most useful simplified
forms of CFG is called the Chomsky normal
form;
Another normal form usually used in algebraic
specifications is Greibach normal form.

Note the difference between grammar cleaning and grammar simplification!

Computation Theory p.2/27

Fact
Normal forms are useful when more advanced topics
in computation theory are approached, as we shall see
further.

Computation Theory p.3/27

Definition
A context-free grammar G is in Chomsky normal
form if every rule is of the form:
A BC
A a
where a is a terminal, A, B, C are nonterminals, and
B, C may not be the start variable (the axiom).

Computation Theory p.4/27

Observation
The rule S , where S is the start variable, is not
excluded from a CFG in Chomsky normal form.

Computation Theory p.5/27

Theorem 2.9
Any context-free language is generated by a
context-free grammar in Chomsky normal form.
Proof idea:
Show that any CFG G can be converted into a CFG G in Chomsky
normal form;
Conversion procedure has several stages where the rules that violate
Chomsky normal form conditions are replaced with equivalent rules
that satisfy these conditions;
Order of transformations:(1) add a new start variable, (2) eliminate all
-rules, (3) eliminate unit-rules, (4) convert other rules;
Check that the obtained CFG G define the same language as the initial
CFG G.

Computation Theory p.6/27

Proof
Let G = (N, , R, S) be the original CFG.
Step 1: add a new start symbol S0 to N , and the rule
S0 S to R
Note: this change guarantees that the start symbol of
G does not occur on the rhs of any rule.

Computation Theory p.7/27

Step 2: eliminate -rules


Repeat
1. Eliminate the rule A from R where A is not the start symbol;
2. For each occurrence of A on the rhs of a rule, add a new rule to R with
that occurrence of A deleted;
Examples: (1) replace B uAv by B uAv|uv;
(2) replace B uAvAw by B uAvAw|uvAw|aAvw|uvw.
3. Replace the rule B A, (if it is present) by B A| unless the
rule B has been previously eliminated;

until all rules are eliminated.

Computation Theory p.8/27

Step 3: remove unit rules


Repeat:
1. Remove a unit rule A B R;
2. For each rule B u R, add the rule A u to R, unless
B u was a unit rule previously removed;

until all unit rules are eliminated.


Note: u is a string of variables and terminals.

Computation Theory p.9/27

Convert all remaining rules


Repeat:
1. Replace a rule A u1 u2 . . . uk , k 3, where each ui , 1 i k, is
a variable or a terminal, by:
A u1 A1 , A1 u2 A2 , . . ., Ak2 uk1 uk
where A1 ,A2 , . . ., Ak2 are new variables;
2. If k 2 replace any terminal ui with a new variable Ui and add the rule
Ui ui ;

until no rules of the form A u1 u2 . . . uk with k 3


remain.

Computation Theory p.10/27

Example CFG conversion


Consider the grammar G6 whose rules are:
S

ASA|aB

A B|S
B

b|

Notation: symbols removed are green and those added are red.
After first step of transformation we get:
S0

ASA|aB

A
B

B|S
b|

Computation Theory p.11/27

Removing rules
Removing B :
S0

ASA|aB|a

B|S|

b|

S0

ASA|aB|a|SA|AS|S

B|S|

Removing A :

Computation Theory p.12/27

Removing unit rule


Removing S S:
S0

ASA|aB|a|SA|AS|S

B|S

S0

S|ASA|aB|a|SA|AS

ASA|aB|a|SA|AS

B|S

Removing S0 S:

Computation Theory p.13/27

More unit rules


Removing A B:
S0

ASA|aB|a|SA|AS

ASA|aB|a|SA|AS

B|S|b

Removing A S:
S0

ASA|aB|a|SA|AS

ASA|aB|a|SA|AS

S|b|ASA|aB|a|SA|AS

Computation Theory p.14/27

Converting remaining rules


S0

AA1 |U B|a|SA|AS

AA1 |U B|a|SA|AS

b|AA1 |U B|a|SA|AS

A1

SA

Computation Theory p.15/27

Observation

The conversion procedure produces several


variables Ui along with several rules Ui a;
Since all these represent the same rule, we may
simplify the result using a single variable U and a
single rule U a.

Computation Theory p.16/27

Greibach Normal Form


A context-free grammar G = (V, , R, S) is in
Greibach normal form if each rule r R has the
property: lhs(r) V , rhs(r) = a, a and
V .
Note: Greibach normal form provides a justification of
operator prefix-notation usually employed in algebra.

Computation Theory p.17/27

Greibach Theorem
Every CFL L where 6 L can be generated by a CFG
in Greibach normal form.
Proof idea: Let G = (V, , R, S) be a CFG generating L. Assume that G is
in Chomsky normal form
Let V = {A1 , A2 , . . . , Am } be an ordering of nonterminals.
Construct the Greibach normal form from Chomsky normal form.

Computation Theory p.18/27

Construction
1. Modify the rules in R so that if Ai Aj R then j > i
2. Starting with A1 and proceeding to Am this is done as follows:
(a) Assume that productions have been modified so that for
1 i k, Ai Aj R only if j > i;
(b) If Ak Aj is a production with j < k, generate a new set of
productions substituting for the Aj the rhs of each Aj production;
(c) Repeating (b) at most k 1 times we obtain rules of the form
Ak Ap , p k;
(d) Replace rules Ak Ak by removing left-recursive rules.

Computation Theory p.19/27

Facts
Let m be the highest order nonterminal generated
during Greibach normal form construction algorithm.
Note: if no left-recursive rules (Ai Ai , for some 1 i m) are in the
grammar then we have:
1. The leftmost symbol on the rhs of each rule for Am must be a terminal,
since Am is the highest order variable.
2. The leftmost symbol on the rhs of any rule for Am1 must be either
Am or a terminal symbol. If it is Am we can generate new rules
replacing Am with the rhs of the Am rules.
3. Repeating (2) for Am2 , . . . , A2 , A1 we obtain only rules whose rhs
start with terminal symbols.

Computation Theory p.20/27

Left recursion
A CFG containing rules of the form
A A|
is called left-recursive in A.
The language generated by such rules is of the form

A n . If we replace the rules A A| by


A B|, B B|
where B is a new variable, then the language generated
by A is the same while no left-recursive A-rules are
used in the derivation.

Computation Theory p.21/27

Removing left-recursion
Left-recursion can be eliminated by the following
scheme:
If A A1 |A2 . . . |Ar are all A left recursive rules, and
A 1 |2 | . . . |s are all remaining A-rules then chose a new
nonterminal, say B;
Add the new B-rules B i |i B, 1 i r;
Replace the A-rules by A i |i B, 1 i s.

This construction preserve the language L.

Computation Theory p.22/27

Conversion algorithm
for (k=1; k<=m; k++)
{for (j=1; j<=k-1; j++)
{for (each A_k ---> A_j alpha)
{
for all rules A_j ---> beta
add A_k ---> beta alpha
remove A_k ---> A_j alpha
}
for (each rule A_k ---> A_k alpha)
{
add rules B_k ---> alpha | alpha B_k
remove A_k ---> A_k alpha
}
for (each rule A_k ---> beta, beta does not begin with A_k)
add rule A_k ---> beta B_k
}
}

Computation Theory p.23/27

More on Greibach NF
See Introduction to Automata Theory, Languages, and
Computation, J.E, Hopcroft and J.D Ullman, AddisonWesley 1979, p. 9496.

Computation Theory p.24/27

Example
Convert the CFG
G = ({A1 , A2 , A3 }, {a, b}, R, A1 )

where
R = {A1 A2 A3 , A2 A3 A1 |b, A3 A1 A2 |a}

into Greibach normal form.

Computation Theory p.25/27

Solution
1. Step 1: ordering the rules: (Only A3 rules violate ordering conditions,
hence only A3 rules need to be changed).
Following the procedure we replace A3 rules by:
A3 A3 A1 A3 A2 |bA3 A2 |a
2. Eliminating left-recursion we get: A3 bA3 A2 B3 |aB3 |bA3 A2 |a,
B3 A1 A3 A2 |A1 A3 A2 B3 ;
3. All A3 rules start with a terminal. We use them to replace A1 A2 A3 .
This introduces the rules B3 A1 A3 A2 |A1 A3 A2 B3 ;
4. Use A1 production to make them start with a terminal.

Computation Theory p.26/27

The result
A3
A2
A1
B3
B3
B3
B3

bA3 A2 B3 |bA3 A2 |aB3 |a


bA3 A2 B3 A1 |bA3 A2 A1 |aB3 A1 |aA1 |b
bA3 A2 B3 A1 A3 |bA3 A2 A1 A3 |aB3 A1 A3 |aA1 A3 |bA3
bA3 A2 B3 A1 A3 A3 A2 B3 |bA3 A2 B3 A1 A3 A1 A3 A3 A2
bA3 A3 A2 B3 |bA3 A3 A2 |bA3 A2 A1 A3 A3 A2 B3 |bA3 A2 A1 A3 A3 A2
aB3 A1 A3 A3 A2 B3 |aB3 A1 A3 A3 A2
aA1 A3 A3 A2 B3 |aA1 A3 A3 A2

Disclaimer: I copied this result from Hopcroft and Ulman, page 98-99!
Hence, I cannot guarantee its correctness.

Computation Theory p.27/27