This action might not be possible to undo. Are you sure you want to continue?

# YANG YANG 1

**Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing
**

LL(1) left-to-right scanning

leftmost derivation

1-token lookahead

parser generator:

Parsing becomes the easiest!

Modifying parsers

is also convenient.

YANG YANG 2

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Given the productions

A ÷e

1

A ÷e

2

.....

A ÷e

n

During a (leftmost) derivation,

... A ... ÷... e

1

... or

÷... e

2

... or

÷... e

n

...

Which route should we choose?

(Try-and-error is not a good idea.)

» Use the lookahead symbols.

YANG YANG 3

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Consider the situation:

We are about to expand a nonterminal

A and there are several productions

whose LHS are A:

A ÷e

1

A ÷e

2

.....

A ÷e

n

We choose one of the productions

based on the lookahead token.

Which one should we choose?

Consider First(e

1

)

First(e

2

)

......

First(e

n

)

and

if e

i

÷ì, then consider also Follow(A).

*

YANG YANG 4

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Define

predict(A ÷e)

=First(e) (if ì First(e) then Follow(A))

If the lookahead token a predict(A÷e)

then we use the production A÷e to

expand A.

What if a predict(A ÷e

1

) and

a predict(A ÷e

2

)?

What if a Zpredict(A÷e) for all

productions A÷e whose LHS are A?

YANG YANG 5

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Property of LL(1) grammars:

If a grammar is LL(1), then

for any two productions

A ÷ e

A ÷ 0

First(eFollow(A)) ·

First(0Follow(A)) = o

YANG YANG 6

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Figure 5.1

A Micro grammar in standard form

Given the FIRST and FOLLOW sets in Fig. 5-2

and 5-3, calculate the predict

set for each production.

YANG YANG 7

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

§5.2 LL(1) Parse Table

The predict() function may be

represented as an LL(1) parse table.

T: Vn * Vt P {error}

a b ......

A 3

B error

....

T[A, a] = A÷e if a predict(A÷e)

= error otherwise

A grammar is LL(1) iff all entries in the

parse table contain a unique

production or the error flag.

YANG YANG 8

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Figure 5.5

The LL(1) table for Micro

YANG YANG 9

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

5.3 LL(1) parsers

Similar to scanners, there are two

kinds of parsers:

1. built-in: recursive descent

2. table-driven

YANG YANG 10

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

1. built-in

stmt()

{

token = next_token();

switch(token) {

case ID:

/*production 5:stmt-->ID:=<exp>;*/

match(ID);

match(ASSIGN);

exp();

match(SEMICOLON);

break;

case READ: /*production 6*/

...

case WRITE: /*production 7*/

...

default: syntax_error(....);

}

}

YANG YANG 11

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

It is obvious that these recursive descent

parsing procedures can be generated

automatically from the grammar.

grammar LL(1) table

parser

generator

recursive descent

parser

However, it is difficult for the parser

generator to integrate the semantic

routines into the (generated) recursive

descent parser automatically.

YANG YANG 12

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

2. table-driven parser

(+) generic driver

Only the LL(1) table needs

to be changed when the

grammar is modified.

(+) non-recursive (faster)

Parser maintains a stack itself.

No recursive calls.

YANG YANG 13

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

lldriver()

{

push( START_SYMBOL );

a := next_token;

while stack is not empty do

{

X := symbol on stack top

if ( X is a nondeterminal &&

T[X, a] == X÷Y

1

Y

m

)¶)

pop(1);

push Y

m

, Y

m-1

, , Y

1

else if ( x == a )

pop(1);

a := next_token();

else if ( x is an action symbol )

pop(1);

call correspond routine

else syntax_error();

}

}

YANG YANG 14

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Ex.

begin A := B - 3 + A; end $

a = begin

X = <GOAL>

<GOAL>

parse

stack

Trace the action of the parser on this example.

YANG YANG 15

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

5.5 Action symbols

Action symbols may be processed by

the parser in a similar way.

1. in recursive descent parsers

Ex.gen_action( ³ID:=<exp>#assign´ );´)

will generate the following code:

match(ID);

match(ASSIGN);

exp();

assign();

match(semicolon);

Parameters are transmitted through a

semantic stack.

Semantic stack is a stack of semantic

records.

Parser stack is a stack of grammar

(and action) symbols.

YANG YANG 16

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

2. in LL(1) driver

Action symbols are pushed

into the parse stack in the

same way as

grammar symbols.

When action symbols are

on stack top, the driver calls

corresponding semantic

routines.

See previous slide for

lldriver.

Parameters are transmitted

through semantic stack.

YANG YANG 17

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

§5.6 Making grammars LL(1)

Not all grammars are LL(1). However,

some non-LL(1) grammars can be

made LL(1) by simple modifications.

When is a grammar not LL(1)?

When there is an entry in the parse

table that contains more than one

productions.

Ex. ...... ID ......

....

<stmt> 2,5

....

This is called a conflict, which means

we do not know which production to

use when <stmt> is on stack top and

ID is the next input token.

YANG YANG 18

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Conflicts are classified into two

categories:

1. common prefix

2. left recursion

Common prefix

Ex.

<stmt>÷if <exp> then <stmt>

<stmt>÷if <exp> then <stmt> else <stmt>

Consider when <stmt> is on stack

top, µif¶ is the next input token. We

cannot choose which production

to use at this time.

In general, if we have two productions

A ÷ e

A ÷ 0

and First(e) ·First(0) = o,

then we have a conflict.

YANG YANG 19

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Solution:

factor out

the common prefix

Ex.

<stmt> ÷if <exp>

then <stmt> <tail>

<tail> ÷

<tail> ÷else <stmt>

YANG YANG 20

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

2. left recursion:

productions of the form:

A ÷A e

grammar with left-recursive

productions are not LL(1)

because we may have

A ÷Ae ÷Aee ÷

same lookahead

YANG YANG 21

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Solution: replace the productions

A ÷A e

A ÷ 0

A ÷ ¸

Intuition: all the strings derivable from A

have the form:

0, 0e, 0ee, 0eee,

¸, ¸e, ¸ee, ¸eee,

So we may use the following

productions instead:

A ÷ 0 T

A ÷ ¸ T

T ÷

T ÷ e T

Left recursion Right recursion

YANG YANG 22

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Ex. Given the left-recursive grammar:

E ÷E + T

E ÷T

T ÷T * P

T ÷P

P ÷ID

After eliminating left recursion, we get

E ÷T A

A ÷

A ÷+ T A

T ÷P B

B ÷

B ÷* P B

P ÷ID

YANG YANG 23

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

3. more general solution

ex.

<stmt> ÷<label> <unlabeled stmt>

<label> ÷ID :

<label> ÷

<unlabeled stmt> ÷ID := <exp> ;

We cannot decide which production to

use when <label> is on the stack top

and ID is the next token:

<label> ?

<stmt> <unlabeled stmt>

lookahead lookahead

ID ID

YANG YANG 24

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Solution: use the following productions

(which essentially look ahead 2 tokens)

<stmt> ÷ID <suffix>

<suffix> ÷: <unlabeled stmt>

<suffix> ÷:= <exp> ;

<unlabeled stmt> ÷ID := <exp> ;

Try two examples:

A: B := C ;

B := C ;

YANG YANG 25

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

4. For more difficult cases, we use

semantic routines to help parsing.

Ex. In Ada, we may declare arrays as

A: array(I .. J, BOOLEAN)

A straightforward grammar is (for

array bound)

<bound> ÷<exp> .. <exp>

<bound> ÷ID

<exp> ÷ID

<exp> ÷ «

and ID First(<exp>)

This grammar is not LL(1) because we

cannot make a decision when <bound>

is on stack top and ID is the next token.

YANG YANG 26

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Solution:

<bound> ÷<exp> <tail>

<tail> ÷

<tail> ÷ .. <exp>

All grammars can be transformed into

Greibach Normal Form, in which a

production has the form:

A ÷a e

terminal

So given a grammar G, we can do

G ÷GNF ÷no common prefix

no left recursion

but still NOT LL(1)!

Ex. S ÷a A a

S ÷b A b a

A ÷b

A ÷

consider A is on stack top; b is next token.

YANG YANG 27

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

§5.7 The dangling-else problem

Consider

if a then if b then x := 1 else x := 2

Two possibilities:

a a

T T F

b b

T F T x := 2

x := 1 x := 2 x := 1

The problem is which µ if¶ the µ else¶

belong to.

In essence, we are trying to find an

LL(1) grammar for the set

{ [

i

]

j

| i u j u 0}

But is it possible?

YANG YANG 28

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

1st attempt: G1

S ÷[ S C

S ÷

C ÷]

C ÷

This grammar is ambiguous.

Consider [ [ ]

S S

[ S C [ S C

[ S C [ S C ]

]

YANG YANG 29

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

2nd attempt: we can make ] be

associated with the nearest unpaired [

as follows:

S ÷[ S

S ÷T

T ÷[ T ]

T ÷

This grammar is not ambiguous.

Consider [ [ ]

S

[ S

[ T ]

However, this grammar is not LL(1),

either. Consider the case when S is on

stack top and [ is the next input token.

[ First( [ S )

[ First( T )

This grammar can be parsed with a

bottom-up parser, but not a top-down

parser.

YANG YANG 30

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Solution: conflicts + special rules

1. G ÷S ;

2. S ÷if S E

3. S ÷other

4. E ÷else S

5. E ÷

The parse table if else other ;

G 1 1

S 2 3

E 4,5 5

conflicts

We can enforce that T[E, else] = 4th rule.

This essentially forces µelse¶ to be

matched with the nearest unpaired µif¶.

YANG YANG 31

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Alternative solution: change the

language.

Add µend if¶ at the end of every µif¶.

S ÷if S E

S ÷other

E ÷else S end if

E ÷end if

YANG YANG 32

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

§5.9 Properties of LL(1) parsers:

A correct leftmost parse

is guaranteed.

All LL(1) grammars are

un-ambiguous.

linear time and linear space

YANG YANG 33

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

§ llgen

Page 776 of the book

output from llgen

*define

decrtn 1

ifprocess 2

YANG YANG 34

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

§ LL(k) parsing

Recall a grammar is LL(1) only if

for any two productions A ÷ e

and A ÷ 0,

First(eFollow(A)) · First(0Follow(A)) = o

To generalize, we write

for any two productions A÷e and A÷0,

First

k

(eFollow

k

(A))

· First

k

(0Follow

k

(A)) = o

if G is strong LL(k).

The word µstrong¶ means G imposes

too strong a condition.

YANG YANG 35

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Consider

G ÷S $

S ÷a A a

S ÷b A b a

A ÷b

A ÷

± This grammar is not LL(1)

When A is on stack top and b is next

token, we cannot choose between

A ÷b and A ÷.

stack input

b .....

A

......

-- Does it help if we can look ahead two

tokens?

NO! if the next two tokens are bb

then we should choose A ÷b.

if the next two tokens are ba

then we cannot make a choice.

YANG YANG 36

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

case 1. input is aba

a a

A A

S a a

G $ $ $

lookahead match lookahead

ab a ba

at this point,

we should

choose A÷b

case 2. input is bba

a

b a

A b

S b b

G $ $ $

lookahead match lookahead

bb b ba

at this point,

we should

choose A÷

YANG YANG 37

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

So the problem is not the limited number

of lookahead tokens.

The problem is in the µcontext¶.

YANG YANG 38

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Therefore, the grammar is not strong

LL(1).

Actually, we can verify that the

grammar is not strong LL(k) for all ku1

by verify that

First

k

( ba$ ) First

k

( bFollow

k

(A) )

· First

k

( ìFollow

k

(A) )

for all ku1

YANG YANG 39

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

However, it is possible to parse the

language of the grammar under the

following conditions:

1. look ahead two tokens

2. from left to right

3. using the left context

We call such grammars LL(2), rather

than strong LL(2).

Note that LL(2) = strong LL(2)

LL(1) = strong LL(1)

YANG YANG 40

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

LL(k) parsers:

Each nonterminal A [A,L

1

]

[A,L

2

]

.......

where L

i

is a set of terminal strings

of length ! k

Let [A,L] be the nonterminal on top of

stack. Let z be the lookahead (|z|=k).

At this point, we choose production

A÷e only if z First( ey ) for some yL.

Note. If there exists a state [A,L] and

two productions A÷e,A÷0 such that

First

k

( ey ) & First

k

( 0y ) = o

yL yL

then the grammar is not LL(k).

YANG YANG 41

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

When [A,L] is the state on stack top,

assume we choose the production A÷e

Let e = X

0

[B

1

,L

1

]X

1

[B

m

,L

m

]X

m,

where X

i

are terminal strings and

B

i

are nonterminal.

Pop [A,L] from stack. Push

X

0

[B

1

,L

1

]X

1

[B

m

,L

m

]X

m

onto stack, where

L

i

= First

k

( X

i

B

i+1

X

i+1

...B

m

X

m

y )

yL

The start symbol is [S,{ì}]

[A,L] A ÷X

i

B

i+1

X

i+1

...B

m

X

m

y

YANG YANG 42

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Ex. G ÷S $

S ÷a A a

S ÷b A b a

A ÷b

A ÷

1. First

2

(A) = { b, ì }

First

2

(S) = { ab, aa, bb }

First

2

(G) = First

2

(S) = { ab, aa, bb }

2. [G,{ì}] is the start symbol.

3. Consider the production G ÷S $

z First

2

( S$ì ) = { ab, aa, bb }

L

1

= First

2

( $ì ) = {$}

ab,aa,bb predicts

This means [G,{ì}] [S,{$}]

YANG YANG 43

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

4. Consider the production S ÷a A a.

[S,{$}]

z First

2

( aAa$ ) = { ab, aa }

L

1

= First

2

( a$ ) = { a$ }

ab,aa

This means [S,{$}] a[A,{a$}]a

5. Consider the production S ÷b A b a

[S,{$}]

z First

2

( bAba$ ) = { bb }

L

1

= First

2

( ba$ ) = { ba }

bb

This means [S,{$}] b[A,{ba}]ba

6. Consider A ÷b [A,{a$}]

z First

2

( ba$ ) = { ba }

No L

1

ba

This means [A,{a$}] b

YANG YANG 44

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

7. Consider A ÷b [A,{ba}]

z First

2

( bba ) = { bb }

No L

1

bb

This means [A,{ba}] b

8. Consider A ÷ [A,{a$}]

z First

2

( a$ ) = { a$ }

No L

1

a$

This means [A,{a$]]

9. Consider A ÷ [A,{ba}]

z First

2

( ba ) = { ba }

No L

1

ba

This means [A,{ba}]

YANG YANG 45

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

In summary, we have 7 productions:

ab,aa,bb

[G,{ì}] [S,{$}]$

ab,aa

[S,{$}] a[A,{a$}]a

bb

[S,{$}] b[A,{ba}]ba

ba

[A,{a$}] b

a$

[A,{a$}]

bb

[A,{ba}] b

ba

[A,{ba}]

Note that there is no conflict on the look-

aheads. Therefore, the grammar is LL(2).

YANG YANG 46

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Now let¢s parse the string abba$

[G,{ì}] ÷[S,{$}] $

÷a [A,{a$}] a $ match a

÷ fail

Parse bbba$

[G,{ì}] ÷[S,{$}] $

÷b [A,{ba}] b a $ match 1st b

÷b b b a $ match 2nd b

match 3rd b

match a

Do you DARE to try exercise 11 on

page 139?

YANG YANG 47

Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing

Some results:

LL(k) LL(k+1)

strong LL(k) strong LL(k+1)

strong LL(k) LL(k) for all k>1

strong LL(1) = LL(1)

L

k

= { a

n

(b,b

k

d)n | n u 1 } needs k-token

lookahead.

Strong LL(1)¶s table is larger.

error detection