Chap 5 LL(1) Parsing
**

LL(1) left-to-right scanning

leftmost derivation

1-token lookahead

parser generator:

Parsing becomes the easiest!

Modifying parsers

is also convenient.

Chap 5 LL(1) Parsing

Given the productions

A ÷e

1

A ÷e

2

.....

A ÷e

n

During a (leftmost) derivation,

... A ... ÷... e

1

... or

÷... e

2

... or

÷... e

n

...

Which route should we choose?

(Try-and-error is not a good idea.)

» Use the lookahead symbols.

Chap 5 LL(1) Parsing

Consider the situation:

We are about to expand a nonterminal

A and there are several productions

whose LHS are A:

A ÷e

1

A ÷e

2

.....

A ÷e

n

We choose one of the productions

based on the lookahead token.

Which one should we choose?

Consider First(e

1

)

First(e

2

)

......

First(e

n

)

and

if e

i

÷ì, then consider also Follow(A).

*

Chap 5 LL(1) Parsing

Define

predict(A ÷e)

=First(e) (if ì First(e) then Follow(A))

If the lookahead token a predict(A÷e)

then we use the production A÷e to

expand A.

What if a predict(A ÷e

1

) and

a predict(A ÷e

2

)?

What if a Zpredict(A÷e) for all

productions A÷e whose LHS are A?

Chap 5 LL(1) Parsing

Property of LL(1) grammars:

If a grammar is LL(1), then

for any two productions

A ÷ e

A ÷ 0

First(eFollow(A)) ·

First(0Follow(A)) = o

Chap 5 LL(1) Parsing

Figure 5.1

A Micro grammar in standard form

Given the FIRST and FOLLOW sets in Fig. 5-2

and 5-3, calculate the predict

set for each production.

Chap 5 LL(1) Parsing

§5.2 LL(1) Parse Table

§5.2 LL(1) Parse Table

The predict() function may be

represented as an LL(1) parse table.

T: Vn * Vt P {error}

a b ......

A 3

B error

....

T[A, a] = A÷e if a predict(A÷e)

= error otherwise

A grammar is LL(1) iff all entries in the

parse table contain a unique

production or the error flag.

Chap 5 LL(1) Parsing

Figure 5.5

The LL(1) table for Micro

Chap 5 LL(1) Parsing

5.3 LL(1) parsers

5.3 LL(1) parsers

Similar to scanners, there are two

kinds of parsers:

1. built-in: recursive descent

2. table-driven

Chap 5 LL(1) Parsing

1. built-in

1. built-in

stmt()

{

token = next_token();

switch(token) {

case ID:

/*production 5:stmt-->ID:=<exp>;*/

match(ID);

match(ASSIGN);

exp();

match(SEMICOLON);

break;

case READ: /*production 6*/

...

case WRITE: /*production 7*/

...

default: syntax_error(....);

}

}

Chap 5 LL(1) Parsing

It is obvious that these recursive descent

parsing procedures can be generated

automatically from the grammar.

grammar LL(1) table

parser

generator

recursive descent

parser

However, it is difficult for the parser

generator to integrate the semantic

routines into the (generated) recursive

descent parser automatically.

Chap 5 LL(1) Parsing

2. table-driven parser

2. table-driven parser

(+) generic driver

Only the LL(1) table needs

to be changed when the

grammar is modified.

(+) non-recursive (faster)

Parser maintains a stack itself.

No recursive calls.

Chap 5 LL(1) Parsing

lldriver()

{

push( START_SYMBOL );

a := next_token;

while stack is not empty do

{

X := symbol on stack top

if ( X is a nondeterminal &&

T[X, a] == X÷Y

1

Y

m

)¶)

pop(1);

push Y

m

, Y

m-1

, , Y

1

else if ( x == a )

pop(1);

a := next_token();

else if ( x is an action symbol )

pop(1);

call correspond routine

else syntax_error();

}

}

Chap 5 LL(1) Parsing

Ex.

begin A := B - 3 + A; end $

a = begin

X = <GOAL>

<GOAL>

parse

stack

Trace the action of the parser on this example.

Chap 5 LL(1) Parsing

5.5 Action symbols

5.5 Action symbols

Action symbols may be processed by

the parser in a similar way.

1. in recursive descent parsers

Ex.gen_action( ³ID:=<exp>#assign´ );´)

will generate the following code:

match(ID);

match(ASSIGN);

exp();

assign();

match(semicolon);

Parameters are transmitted through a

semantic stack.

Semantic stack is a stack of semantic

records.

Parser stack is a stack of grammar

(and action) symbols.

Chap 5 LL(1) Parsing

2. in LL(1) driver

Action symbols are pushed

into the parse stack in the

same way as

grammar symbols.

When action symbols are

on stack top, the driver calls

corresponding semantic

routines.

See previous slide for

lldriver.

Parameters are transmitted

through semantic stack.

Chap 5 LL(1) Parsing

§5.6 Making grammars LL(1)

§5.6 Making grammars LL(1)

Not all grammars are LL(1). However,

some non-LL(1) grammars can be

made LL(1) by simple modifications.

When is a grammar not LL(1)?

When there is an entry in the parse

table that contains more than one

productions.

Ex. ...... ID ......

....

<stmt> 2,5

....

This is called a conflict, which means

we do not know which production to

use when <stmt> is on stack top and

ID is the next input token.

Chap 5 LL(1) Parsing

Conflicts are classified into two

categories:

1. common prefix

2. left recursion

Common prefix

Ex.

<stmt>÷if <exp> then <stmt>

<stmt>÷if <exp> then <stmt> else <stmt>

Consider when <stmt> is on stack

top, µif¶ is the next input token. We

cannot choose which production

to use at this time.

In general, if we have two productions

A ÷ e

A ÷ 0

and First(e) ·First(0) = o,

then we have a conflict.

Chap 5 LL(1) Parsing

Solution:

factor out

the common prefix

Ex.

<stmt> ÷if <exp>

then <stmt> <tail>

<tail> ÷

<tail> ÷else <stmt>

Chap 5 LL(1) Parsing

2. left recursion:

productions of the form:

A ÷A e

grammar with left-recursive

productions are not LL(1)

because we may have

A ÷Ae ÷Aee ÷

same lookahead

Chap 5 LL(1) Parsing

Solution: replace the productions

A ÷A e

A ÷ 0

A ÷ ¸

Intuition: all the strings derivable from A

have the form:

0, 0e, 0ee, 0eee,

¸, ¸e, ¸ee, ¸eee,

So we may use the following

productions instead:

A ÷ 0 T

A ÷ ¸ T

T ÷

T ÷ e T

Left recursion Right recursion

Chap 5 LL(1) Parsing

Ex. Given the left-recursive grammar:

E ÷E + T

E ÷T

T ÷T * P

T ÷P

P ÷ID

After eliminating left recursion, we get

E ÷T A

A ÷

A ÷+ T A

T ÷P B

B ÷

B ÷* P B

P ÷ID

Chap 5 LL(1) Parsing

3. more general solution

ex.

<stmt> ÷<label> <unlabeled stmt>

<label> ÷ID :

<label> ÷

<unlabeled stmt> ÷ID := <exp> ;

We cannot decide which production to

use when <label> is on the stack top

and ID is the next token:

<label> ?

<stmt> <unlabeled stmt>

lookahead lookahead

ID ID

YANG YANG 24

Chap 5 LL(1) Parsing

Solution: use the following productions

(which essentially look ahead 2 tokens)

<stmt> ÷ID <suffix>

<suffix> ÷: <unlabeled stmt>

<suffix> ÷:= <exp> ;

<unlabeled stmt> ÷ID := <exp> ;

Try two examples:

A: B := C ;

B := C ;

Chap 5 LL(1) Parsing

4. For more difficult cases, we use

semantic routines to help parsing.

Ex. In Ada, we may declare arrays as

A: array(I .. J, BOOLEAN)

A straightforward grammar is (for

array bound)

<bound> ÷<exp> .. <exp>

<bound> ÷ID

<exp> ÷ID

<exp> ÷ «

and ID First(<exp>)

This grammar is not LL(1) because we

cannot make a decision when <bound>

is on stack top and ID is the next token.

Chap 5 LL(1) Parsing

Solution:

<bound> ÷<exp> <tail>

<tail> ÷

<tail> ÷ .. <exp>

All grammars can be transformed into

Greibach Normal Form, in which a

production has the form:

A ÷a e

terminal

So given a grammar G, we can do

G ÷GNF ÷no common prefix

no left recursion

but still NOT LL(1)!

Ex. S ÷a A a

S ÷b A b a

A ÷b

A ÷

consider A is on stack top; b is next token.

Chap 5 LL(1) Parsing

§5.7 The dangling-else problem

§5.7 The dangling-else problem

Consider

if a then if b then x := 1 else x := 2

Two possibilities:

a a

T T F

b b

T F T x := 2

x := 1 x := 2 x := 1

The problem is which µ if¶ the µ else¶

belong to.

In essence, we are trying to find an

LL(1) grammar for the set

{ [

i

]

j

| i u j u 0}

But is it possible?

Chap 5 LL(1) Parsing

1st attempt: G1

S ÷[ S C

S ÷

C ÷]

C ÷

This grammar is ambiguous.

Consider [ [ ]

S S

[ S C [ S C

[ S C [ S C ]

]

Chap 5 LL(1) Parsing

2nd attempt: we can make ] be

associated with the nearest unpaired [

as follows:

S ÷[ S

S ÷T

T ÷[ T ]

T ÷

This grammar is not ambiguous.

Consider [ [ ]

S

[ S

[ T ]

However, this grammar is not LL(1),

either. Consider the case when S is on

stack top and [ is the next input token.

[ First( [ S )

[ First( T )

This grammar can be parsed with a

bottom-up parser, but not a top-down

parser.

Chap 5 LL(1) Parsing

Solution: conflicts + special rules

1. G ÷S ;

2. S ÷if S E

3. S ÷other

4. E ÷else S

5. E ÷

The parse table if else other ;

G 1 1

S 2 3

E 4,5 5

conflicts

We can enforce that T[E, else] = 4th rule.

This essentially forces µelse¶ to be

matched with the nearest unpaired µif¶.

Chap 5 LL(1) Parsing

Alternative solution: change the

language.

Add µend if¶ at the end of every µif¶.

S ÷if S E

S ÷other

E ÷else S end if

E ÷end if

Chap 5 LL(1) Parsing

§5.9 Properties of LL(1) parsers:

§5.9 Properties of LL(1) parsers:

A correct leftmost parse

is guaranteed.

All LL(1) grammars are

un-ambiguous.

linear time and linear space

Chap 5 LL(1) Parsing

§ llgen

§ llgen

Page 776 of the book

output from llgen

*define

decrtn 1

ifprocess 2

Chap 5 LL(1) Parsing

§ LL(k) parsing

§ LL(k) parsing

Recall a grammar is LL(1) only if

for any two productions A ÷ e

and A ÷ 0,

First(eFollow(A)) · First(0Follow(A)) = o

To generalize, we write

for any two productions A÷e and A÷0,

First

k

(eFollow

k

(A))

· First

k

(0Follow

k

(A)) = o

if G is strong LL(k).

The word µstrong¶ means G imposes

too strong a condition.

Chap 5 LL(1) Parsing

Consider

G ÷S $

S ÷a A a

S ÷b A b a

A ÷b

A ÷

± This grammar is not LL(1)

When A is on stack top and b is next

token, we cannot choose between

A ÷b and A ÷.

stack input

b .....

A

......

-- Does it help if we can look ahead two

tokens?

NO! if the next two tokens are bb

then we should choose A ÷b.

if the next two tokens are ba

then we cannot make a choice.

Chap 5 LL(1) Parsing

case 1. input is aba

a a

A A

S a a

G $ $ $

lookahead match lookahead

ab a ba

at this point,

we should

choose A÷b

case 2. input is bba

a

b a

A b

S b b

G $ $ $

lookahead match lookahead

bb b ba

at this point,

we should

choose A÷

Chap 5 LL(1) Parsing

So the problem is not the limited number

of lookahead tokens.

The problem is in the µcontext¶.

Chap 5 LL(1) Parsing

Therefore, the grammar is not strong

LL(1).

Actually, we can verify that the

grammar is not strong LL(k) for all ku1

by verify that

First

k

( ba$ ) First

k

( bFollow

k

(A) )

· First

k

( ìFollow

k

(A) )

for all ku1

Chap 5 LL(1) Parsing

However, it is possible to parse the

language of the grammar under the

following conditions:

1. look ahead two tokens

2. from left to right

3. using the left context

We call such grammars LL(2), rather

than strong LL(2).

Note that LL(2) = strong LL(2)

LL(1) = strong LL(1)

Chap 5 LL(1) Parsing

LL(k) parsers:

Each nonterminal A [A,L

1

]

[A,L

2

]

.......

where L

i

is a set of terminal strings

of length ! k

Let [A,L] be the nonterminal on top of

stack. Let z be the lookahead (|z|=k).

At this point, we choose production

A÷e only if z First( ey ) for some yL.

Note. If there exists a state [A,L] and

two productions A÷e,A÷0 such that

First

k

( ey ) & First

k

( 0y ) = o

yL yL

then the grammar is not LL(k).

Chap 5 LL(1) Parsing

When [A,L] is the state on stack top,

assume we choose the production A÷e

Let e = X

0

[B

1

,L

1

]X

1

[B

m

,L

m

]X

m,

where X

i

are terminal strings and

B

i

are nonterminal.

Pop [A,L] from stack. Push

X

0

[B

1

,L

1

]X

1

[B

m

,L

m

]X

m

onto stack, where

L

i

= First

k

( X

i

B

i+1

X

i+1

...B

m

X

m

y )

yL

The start symbol is [S,{ì}]

[A,L] A ÷X

i

B

i+1

X

i+1

...B

m

X

m

y

Chap 5 LL(1) Parsing

Ex. G ÷S $

S ÷a A a

S ÷b A b a

A ÷b

A ÷

1. First

2

(A) = { b, ì }

First

2

(S) = { ab, aa, bb }

First

2

(G) = First

2

(S) = { ab, aa, bb }

2. [G,{ì}] is the start symbol.

3. Consider the production G ÷S $

z First

2

( S$ì ) = { ab, aa, bb }

L

1

= First

2

( $ì ) = {$}

ab,aa,bb predicts

This means [G,{ì}] [S,{$}]

Chap 5 LL(1) Parsing

4. Consider the production S ÷a A a.

[S,{$}]

z First

2

( aAa$ ) = { ab, aa }

L

1

= First

2

( a$ ) = { a$ }

ab,aa

This means [S,{$}] a[A,{a$}]a

5. Consider the production S ÷b A b a

[S,{$}]

z First

2

( bAba$ ) = { bb }

L

1

= First

2

( ba$ ) = { ba }

bb

This means [S,{$}] b[A,{ba}]ba

6. Consider A ÷b [A,{a$}]

z First

2

( ba$ ) = { ba }

No L

1

ba

This means [A,{a$}] b

Chap 5 LL(1) Parsing

7. Consider A ÷b [A,{ba}]

z First

2

( bba ) = { bb }

No L

1

bb

This means [A,{ba}] b

8. Consider A ÷ [A,{a$}]

z First

2

( a$ ) = { a$ }

No L

1

a$

This means [A,{a$]]

9. Consider A ÷ [A,{ba}]

z First

2

( ba ) = { ba }

No L

1

ba

This means [A,{ba}]

Chap 5 LL(1) Parsing

In summary, we have 7 productions:

ab,aa,bb

[G,{ì}] [S,{$}]$

ab,aa

[S,{$}] a[A,{a$}]a

bb

[S,{$}] b[A,{ba}]ba

ba

[A,{a$}] b

a$

[A,{a$}]

bb

[A,{ba}] b

ba

[A,{ba}]

Note that there is no conflict on the look-

aheads. Therefore, the grammar is LL(2).

Chap 5 LL(1) Parsing

Now let¢s parse the string abba$

[G,{ì}] ÷[S,{$}] $

÷a [A,{a$}] a $ match a

÷ fail

Parse bbba$

[G,{ì}] ÷[S,{$}] $

÷b [A,{ba}] b a $ match 1st b

÷b b b a $ match 2nd b

match 3rd b

match a

Do you DARE to try exercise 11 on

page 139?

Chap 5 LL(1) Parsing

Some results:

LL(k) LL(k+1)

strong LL(k) strong LL(k+1)

strong LL(k) LL(k) for all k>1

strong LL(1) = LL(1)

L

k

= { a

n

(b,b

k

d)n | n u 1 } needs k-token

lookahead.

Strong LL(1)¶s table is larger.

error detection