**CORTES, MBA,MPA,BSCS,ACS Fall 2008
**

CSC 3130: Automata theory and formal languages

Normal forms and parsing

Andrej Bogdanov

http://www.cse.cuhk.edu.hk/~andrejb/csc3130

**Testing membership and parsing
**

• Given a grammar

S → 0S1 | 1S0S1 | T T→S|e

• How can we know if a string x is in its language? • If so, can we reconstruct a parse tree for x?

First attempt

S → 0S1 | 1S0S1 | T T→S| x = 00111

**• Maybe we can try all possible derivations:
**

S 0S1 00S11 01S0S11 0T1 10S10S1 S

1S0S1 ... T

when do we stop?

Problems

S → 0S1 | 1S0S1 | T T→S| x = 00111

**• How do we know when to stop?
**

S 0S1 00S11 01S0S11 0T1 10S10S1

when do we stop?

1S0S1 ...

Problems

S → 0S1 | 1S0S1 | T T→S| x = 01011

**• Idea: Stop derivation when length exceeds |x|
**

• Not right because of -productions

S 0S1 01S0S11 01S011 01011

1 3 7 6 5

• We might want to eliminate -productions too

Problems

S → 0S1 | 1S0S1 | T T→S| x = 00111

• Loops among the variables (S → T → S) might make us go forever • We might want to eliminate such loops

Unit productions

• A unit production is a production of the form

A 1 → A2 where A1 and A2 are both variables • Example

grammar: unit productions:

S → 0S1 | 1S0S1 | T T→S|R| R → 0SR

S

R

T

**Removal of unit productions
**

• If there is a cycle of unit productions A1 → A2 → ... → Ak → A1

delete it and replace everything with A1 • Example

S → 0S1 | 1S0S1 | T T → | R | S R → 0SR S T S → 0S1 | 1S0S1 S→R| R → 0SR

R

T is replaced by S in the {S, T} cycle

**Removal of unit productions
**

• For other unit productions, replace every chain A1 → A2 → ... → Ak →

by productions A1 → ,... , Ak → • Example

S → 0S1 | 1S0S1 |R| R → 0SR S → 0S1 | 1S0S1 | 0SR | R → 0SR

S → R → 0SR is replaced by S → 0SR, R → 0SR

Removal of -productions

• A variable N is nullable if there is a derivation

* N

**• How to remove -productions (except from S)
**

Find all nullable variables N1, ..., Nk For i = 1 to k For every production of the form A → Ni, add another production A → If Ni → is a production, remove it If S is nullable, add the special production S →

Example

• Find the nullable variables

grammar nullable variables B C D

S ACD A a B C ED | D BC | b Eb

Find all nullable variables N1, ..., Nk

**Finding nullable variables
**

• To find nullable variables, we work backwards

– First, mark all variables A s.t. A as nullable – Then, as long as there are productions of the form A → A1… Ak where all of A1,…, Ak are marked as nullable, mark A as nullable

Eliminating -productions

S ACD A a B C ED | D BC | b Eb nullable variables: B, C, D DC S AD DB D S AC S A C E

For i = 1 to k For every production of the form A → Ni, add another production A → If Ni → is a production, remove it

Recap

• After eliminating -productions and unit productions, we know that every derivation

* S a1…ak

where a1, …, ak are terminals

doesn’t shrink in length and doesn’t go into cycles

• Exception: S →

– We will not use this rule at all, except to check if L

• Note

-productions must be eliminated before unit

**Example: testing membership
**

S → 0S1 | 1S0S1 | T T→S|

eliminate unit, -prod

S → | 01 | 101 | 0S1 |10S1 | 1S01 | 1S0S1

x = 00111

S 01, 101 0S1

0011, 01011 only strings of length ≥ 6 00S11 strings of length ≥ 6 10011, strings of length ≥ 6 10101, strings of length ≥ 6 only strings of length ≥ 6

10S1 1S01 1S0S1

**Algorithm 1 for testing membership
**

• We can now use the following algorithm to check if a string x is in the language of G

Eliminate all -productions and unit productions If x = and S → , accept; else delete S → Let X := S While some new production P can be applied to X Apply P to X If X = x, accept If |X| > |x|, backtrack If no more productions can be applied to X, reject

**Practical limitations of Algorithm I
**

• Previous algorithm can be very slow if x is long

G = CFG of the java programming language x = code for a 200-line java program

algorithm might take about 10200 steps!

• There is a faster algorithm, but it requires that we do some more transformations on the grammar

**Chomsky Normal Form
**

• A grammar is in Chomsky Normal Form if every production (except possibly S → ) is of the type

A → BC

or

A→a

**• Conversion to Chomsky Normal Form is easy:
**

A → BcDE

replace terminals with new variables

A → BCDE C→c

break up sequences with new variables

A → BX1 X1 → CX2 X2 → DE C→c

Exercise

• Convert this CFG into Chomsky Normal Form:

S |ADDA Aa Cc D bCb

**Algorithm 2 for testing membership
**

S AB | BC A BA | a B CC | b C AB | a SAC

x = baaba

– – SA B

b

SAC B B AC

a

B SC

SA

AC

a

B

b

AC

a

Idea: We generate each substring of x bottom up

**Parse tree reconstruction
**

S AB | BC A BA | a B CC | b C AB | a SAC

x = baaba

– – SA B

b

SAC B B AC

a

B SC

SA

AC

a

B

b

AC

a

Tracing back the derivations, we obtain the parse tree

**Cocke-Younger-Kasami algorithm
**

Input: Grammar G in CNF, string x = x1…xk

For i = 1 to k If there is a production A xi Put A in table cell ii For b = 2 to k For s = 1 to k – b + 1 Set t = s + b For j = s to t If there is a production A BC where B is in cell sj and C is in cell jt 1 Put A in cell st

1k

12 11

Cell ij remembers all possible derivations of substring xi…xj

… x1

s

23 22

…

kk

x2

…

j b

xk

t k

