You are on page 1of 22

MELJUN P.

CORTES, MBA,MPA,BSCS,ACS Fall 2008
CSC 3130: Automata theory and formal languages

Normal forms and parsing

Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130

Testing membership and parsing
• Given a grammar
S → 0S1 | 1S0S1 | T T→S|e

• How can we know if a string x is in its language? • If so, can we reconstruct a parse tree for x?

First attempt
S → 0S1 | 1S0S1 | T T→S| x = 00111

• Maybe we can try all possible derivations:
S 0S1 00S11 01S0S11 0T1 10S10S1 S 

1S0S1 ... T

when do we stop?

Problems
S → 0S1 | 1S0S1 | T T→S| x = 00111

• How do we know when to stop?
S 0S1 00S11 01S0S11 0T1 10S10S1

when do we stop?

1S0S1 ...

Problems
S → 0S1 | 1S0S1 | T T→S| x = 01011

• Idea: Stop derivation when length exceeds |x|
• Not right because of -productions
S  0S1  01S0S11  01S011  01011
1 3 7 6 5

• We might want to eliminate -productions too

Problems
S → 0S1 | 1S0S1 | T T→S| x = 00111

• Loops among the variables (S → T → S) might make us go forever • We might want to eliminate such loops

Unit productions
• A unit production is a production of the form
A 1 → A2 where A1 and A2 are both variables • Example
grammar: unit productions:

S → 0S1 | 1S0S1 | T T→S|R| R → 0SR

S
R

T

Removal of unit productions
• If there is a cycle of unit productions A1 → A2 → ... → Ak → A1
delete it and replace everything with A1 • Example
S → 0S1 | 1S0S1 |  T T → | R |  S R → 0SR S T S → 0S1 | 1S0S1 S→R| R → 0SR

R

T is replaced by S in the {S, T} cycle

Removal of unit productions
• For other unit productions, replace every chain A1 → A2 → ... → Ak → 
by productions A1 → ,... , Ak →  • Example
S → 0S1 | 1S0S1 |R| R → 0SR S → 0S1 | 1S0S1 | 0SR |  R → 0SR

S → R → 0SR is replaced by S → 0SR, R → 0SR

Removal of -productions
• A variable N is nullable if there is a derivation
* N

• How to remove -productions (except from S)
 Find all nullable variables N1, ..., Nk  For i = 1 to k For every production of the form A → Ni, add another production A →  If Ni →  is a production, remove it  If S is nullable, add the special production S → 

Example
• Find the nullable variables
grammar nullable variables B C D

S  ACD A a B C  ED |  D  BC | b Eb

 Find all nullable variables N1, ..., Nk

Finding nullable variables
• To find nullable variables, we work backwards
– First, mark all variables A s.t. A   as nullable – Then, as long as there are productions of the form A → A1… Ak where all of A1,…, Ak are marked as nullable, mark A as nullable

Eliminating -productions
S  ACD A a B C  ED |  D  BC | b Eb nullable variables: B, C, D DC S  AD DB D S  AC S A C E

 For i = 1 to k For every production of the form A → Ni, add another production A →  If Ni →  is a production, remove it

Recap
• After eliminating -productions and unit productions, we know that every derivation
* S  a1…ak

where a1, …, ak are terminals

doesn’t shrink in length and doesn’t go into cycles

• Exception: S → 
– We will not use this rule at all, except to check if   L

• Note
 -productions must be eliminated before unit

Example: testing membership
S → 0S1 | 1S0S1 | T T→S|
eliminate unit, -prod

S →  | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1

x = 00111
S 01, 101 0S1
0011, 01011 only strings of length ≥ 6 00S11 strings of length ≥ 6 10011, strings of length ≥ 6 10101, strings of length ≥ 6 only strings of length ≥ 6

10S1 1S01 1S0S1

Algorithm 1 for testing membership
• We can now use the following algorithm to check if a string x is in the language of G
 Eliminate all -productions and unit productions  If x =  and S → , accept; else delete S →   Let X := S  While some new production P can be applied to X Apply P to X If X = x, accept If |X| > |x|, backtrack  If no more productions can be applied to X, reject

Practical limitations of Algorithm I
• Previous algorithm can be very slow if x is long
G = CFG of the java programming language x = code for a 200-line java program

algorithm might take about 10200 steps!

• There is a faster algorithm, but it requires that we do some more transformations on the grammar

Chomsky Normal Form
• A grammar is in Chomsky Normal Form if every production (except possibly S → ) is of the type
A → BC
or

A→a

• Conversion to Chomsky Normal Form is easy:
A → BcDE
replace terminals with new variables

A → BCDE C→c

break up sequences with new variables

A → BX1 X1 → CX2 X2 → DE C→c

Exercise
• Convert this CFG into Chomsky Normal Form:
S   |ADDA Aa Cc D  bCb

Algorithm 2 for testing membership
S  AB | BC A  BA | a B  CC | b C  AB | a SAC

x = baaba

– – SA B
b

SAC B B AC
a

B SC

SA

AC
a

B
b

AC
a

Idea: We generate each substring of x bottom up

Parse tree reconstruction
S  AB | BC A  BA | a B  CC | b C  AB | a SAC

x = baaba

– – SA B
b

SAC B B AC
a

B SC

SA

AC
a

B
b

AC
a

Tracing back the derivations, we obtain the parse tree

Cocke-Younger-Kasami algorithm
Input: Grammar G in CNF, string x = x1…xk

For i = 1 to k If there is a production A  xi Put A in table cell ii For b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = s to t If there is a production A  BC where B is in cell sj and C is in cell jt 1 Put A in cell st

1k

12 11

Cell ij remembers all possible derivations of substring xi…xj

… x1
s

23 22


kk

x2


j b

xk
t k