LL(1) left-to-right scanning leftmost derivation 1-token lookahead parser generator: Parsing becomes the easiest! Modifying parsers is also convenient. YANG YANG 2 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Given the productions A e 1 A e 2 ..... A e n During a (leftmost) derivation, ... A ... ... e 1 ... or ... e 2 ... or ... e n ... Which route should we choose? (Try-and-error is not a good idea.) Use the lookahead symbols. YANG YANG 3 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Consider the situation: We are about to expand a nonterminal A and there are several productions whose LHS are A: A e 1 A e 2 ..... A e n We choose one of the productions based on the lookahead token. Which one should we choose? Consider First(e 1 ) First(e 2 ) ...... First(e n ) and if e i , then consider also Follow(A). * YANG YANG 4 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Define predict(A e) =First(e) (if First(e) then Follow(A)) If the lookahead token a predict(Ae) then we use the production Ae to expand A. What if a predict(A e 1 ) and a predict(A e 2 )? What if a Zpredict(Ae) for all productions Ae whose LHS are A? YANG YANG 5 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Property of LL(1) grammars: If a grammar is LL(1), then for any two productions A e A 0 First(eFollow(A)) First(0Follow(A)) = o YANG YANG 6 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Figure 5.1 A Micro grammar in standard form Given the FIRST and FOLLOW sets in Fig. 5-2 and 5-3, calculate the predict set for each production. YANG YANG 7 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 5.2 LL(1) Parse Table The predict() function may be represented as an LL(1) parse table. T: Vn * Vt P {error} a b ...... A 3 B error .... T[A, a] = Ae if a predict(Ae) = error otherwise A grammar is LL(1) iff all entries in the parse table contain a unique production or the error flag. YANG YANG 8 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Figure 5.5 The LL(1) table for Micro YANG YANG 9 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 5.3 LL(1) parsers Similar to scanners, there are two kinds of parsers: 1. built-in: recursive descent 2. table-driven YANG YANG 10 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 1. built-in stmt() { token = next_token(); switch(token) { case ID: /*production 5:stmt-->ID:=<exp>;*/ match(ID); match(ASSIGN); exp(); match(SEMICOLON); break; case READ: /*production 6*/ ... case WRITE: /*production 7*/ ... default: syntax_error(....); } } YANG YANG 11 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing It is obvious that these recursive descent parsing procedures can be generated automatically from the grammar. grammar LL(1) table parser generator recursive descent parser However, it is difficult for the parser generator to integrate the semantic routines into the (generated) recursive descent parser automatically. YANG YANG 12 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 2. table-driven parser (+) generic driver Only the LL(1) table needs to be changed when the grammar is modified. (+) non-recursive (faster) Parser maintains a stack itself. No recursive calls. YANG YANG 13 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing lldriver() { push( START_SYMBOL ); a := next_token; while stack is not empty do { X := symbol on stack top if ( X is a nondeterminal && T[X, a] == XY 1 Y m )) pop(1); push Y m , Y m-1 , , Y 1 else if ( x == a ) pop(1); a := next_token(); else if ( x is an action symbol ) pop(1); call correspond routine else syntax_error(); } } YANG YANG 14 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Ex. begin A := B - 3 + A; end $ a = begin X = <GOAL> <GOAL> parse stack Trace the action of the parser on this example. YANG YANG 15 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 5.5 Action symbols Action symbols may be processed by the parser in a similar way. 1. in recursive descent parsers Ex.gen_action( ID:=<exp>#assign );) will generate the following code: match(ID); match(ASSIGN); exp(); assign(); match(semicolon); Parameters are transmitted through a semantic stack. Semantic stack is a stack of semantic records. Parser stack is a stack of grammar (and action) symbols. YANG YANG 16 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 2. in LL(1) driver Action symbols are pushed into the parse stack in the same way as grammar symbols. When action symbols are on stack top, the driver calls corresponding semantic routines. See previous slide for lldriver. Parameters are transmitted through semantic stack. YANG YANG 17 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 5.6 Making grammars LL(1) Not all grammars are LL(1). However, some non-LL(1) grammars can be made LL(1) by simple modifications. When is a grammar not LL(1)? When there is an entry in the parse table that contains more than one productions. Ex. ...... ID ...... .... <stmt> 2,5 .... This is called a conflict, which means we do not know which production to use when <stmt> is on stack top and ID is the next input token. YANG YANG 18 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Conflicts are classified into two categories: 1. common prefix 2. left recursion Common prefix Ex. <stmt>if <exp> then <stmt> <stmt>if <exp> then <stmt> else <stmt> Consider when <stmt> is on stack top, if is the next input token. We cannot choose which production to use at this time. In general, if we have two productions A e A 0 and First(e) First(0) = o, then we have a conflict. YANG YANG 19 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Solution: factor out the common prefix Ex. <stmt> if <exp> then <stmt> <tail> <tail> <tail> else <stmt> YANG YANG 20 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 2. left recursion: productions of the form: A A e grammar with left-recursive productions are not LL(1) because we may have A Ae Aee same lookahead YANG YANG 21 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Solution: replace the productions A A e A 0 A Intuition: all the strings derivable from A have the form: 0, 0e, 0ee, 0eee, , e, ee, eee, So we may use the following productions instead: A 0 T A T T T e T Left recursion Right recursion YANG YANG 22 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Ex. Given the left-recursive grammar: E E + T E T T T * P T P P ID After eliminating left recursion, we get E T A A A + T A T P B B B * P B P ID YANG YANG 23 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 3. more general solution ex. <stmt> <label> <unlabeled stmt> <label> ID : <label> <unlabeled stmt> ID := <exp> ; We cannot decide which production to use when <label> is on the stack top and ID is the next token: <label> ? <stmt> <unlabeled stmt> lookahead lookahead ID ID YANG YANG 24 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Solution: use the following productions (which essentially look ahead 2 tokens) <stmt> ID <suffix> <suffix> : <unlabeled stmt> <suffix> := <exp> ; <unlabeled stmt> ID := <exp> ; Try two examples: A: B := C ; B := C ; YANG YANG 25 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 4. For more difficult cases, we use semantic routines to help parsing. Ex. In Ada, we may declare arrays as A: array(I .. J, BOOLEAN) A straightforward grammar is (for array bound) <bound> <exp> .. <exp> <bound> ID <exp> ID <exp> and ID First(<exp>) This grammar is not LL(1) because we cannot make a decision when <bound> is on stack top and ID is the next token. YANG YANG 26 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Solution: <bound> <exp> <tail> <tail> <tail> .. <exp> All grammars can be transformed into Greibach Normal Form, in which a production has the form: A a e terminal So given a grammar G, we can do G GNF no common prefix no left recursion but still NOT LL(1)! Ex. S a A a S b A b a A b A consider A is on stack top; b is next token. YANG YANG 27 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 5.7 The dangling-else problem Consider if a then if b then x := 1 else x := 2 Two possibilities: a a T T F b b T F T x := 2 x := 1 x := 2 x := 1 The problem is which if the else belong to. In essence, we are trying to find an LL(1) grammar for the set { [ i ] j | i u j u 0} But is it possible? YANG YANG 28 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 1st attempt: G1 S [ S C S C ] C This grammar is ambiguous. Consider [ [ ] S S [ S C [ S C [ S C [ S C ] ] YANG YANG 29 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 2nd attempt: we can make ] be associated with the nearest unpaired [ as follows: S [ S S T T [ T ] T This grammar is not ambiguous. Consider [ [ ] S [ S [ T ] However, this grammar is not LL(1), either. Consider the case when S is on stack top and [ is the next input token. [ First( [ S ) [ First( T ) This grammar can be parsed with a bottom-up parser, but not a top-down parser. YANG YANG 30 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Solution: conflicts + special rules 1. G S ; 2. S if S E 3. S other 4. E else S 5. E The parse table if else other ; G 1 1 S 2 3 E 4,5 5 conflicts We can enforce that T[E, else] = 4th rule. This essentially forces else to be matched with the nearest unpaired if. YANG YANG 31 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Alternative solution: change the language. Add end if at the end of every if. S if S E S other E else S end if E end if YANG YANG 32 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 5.9 Properties of LL(1) parsers: A correct leftmost parse is guaranteed. All LL(1) grammars are un-ambiguous. linear time and linear space YANG YANG 33 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing llgen Page 776 of the book output from llgen *define decrtn 1 ifprocess 2 YANG YANG 34 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing LL(k) parsing Recall a grammar is LL(1) only if for any two productions A e and A 0, First(eFollow(A)) First(0Follow(A)) = o To generalize, we write for any two productions Ae and A0, First k (eFollow k (A)) First k (0Follow k (A)) = o if G is strong LL(k). The word strong means G imposes too strong a condition. YANG YANG 35 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Consider G S $ S a A a S b A b a A b A This grammar is not LL(1) When A is on stack top and b is next token, we cannot choose between A b and A . stack input b ..... A ...... -- Does it help if we can look ahead two tokens? NO! if the next two tokens are bb then we should choose A b. if the next two tokens are ba then we cannot make a choice. YANG YANG 36 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing case 1. input is aba a a A A S a a G $ $ $ lookahead match lookahead ab a ba at this point, we should choose Ab case 2. input is bba a b a A b S b b G $ $ $ lookahead match lookahead bb b ba at this point, we should choose A YANG YANG 37 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing So the problem is not the limited number of lookahead tokens. The problem is in the context. YANG YANG 38 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Therefore, the grammar is not strong LL(1). Actually, we can verify that the grammar is not strong LL(k) for all ku1 by verify that First k ( ba$ ) First k ( bFollow k (A) ) First k ( Follow k (A) ) for all ku1 YANG YANG 39 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing However, it is possible to parse the language of the grammar under the following conditions: 1. look ahead two tokens 2. from left to right 3. using the left context We call such grammars LL(2), rather than strong LL(2). Note that LL(2) = strong LL(2) LL(1) = strong LL(1) YANG YANG 40 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing LL(k) parsers: Each nonterminal A [A,L 1 ] [A,L 2 ] ....... where L i is a set of terminal strings of length ! k Let [A,L] be the nonterminal on top of stack. Let z be the lookahead (|z|=k). At this point, we choose production Ae only if z First( ey ) for some yL. Note. If there exists a state [A,L] and two productions Ae,A0 such that First k ( ey ) & First k ( 0y ) = o yL yL then the grammar is not LL(k). YANG YANG 41 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing When [A,L] is the state on stack top, assume we choose the production Ae Let e = X 0 [B 1 ,L 1 ]X 1 [B m ,L m ]X m, where X i are terminal strings and B i are nonterminal. Pop [A,L] from stack. Push X 0 [B 1 ,L 1 ]X 1 [B m ,L m ]X m onto stack, where L i = First k ( X i B i+1 X i+1 ...B m X m y ) yL The start symbol is [S,{}] [A,L] A X i B i+1 X i+1 ...B m X m y YANG YANG 42 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Ex. G S $ S a A a S b A b a A b A 1. First 2 (A) = { b, } First 2 (S) = { ab, aa, bb } First 2 (G) = First 2 (S) = { ab, aa, bb } 2. [G,{}] is the start symbol. 3. Consider the production G S $ z First 2 ( S$ ) = { ab, aa, bb } L 1 = First 2 ( $ ) = {$} ab,aa,bb predicts This means [G,{}] [S,{$}] YANG YANG 43 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 4. Consider the production S a A a. [S,{$}] z First 2 ( aAa$ ) = { ab, aa } L 1 = First 2 ( a$ ) = { a$ } ab,aa This means [S,{$}] a[A,{a$}]a 5. Consider the production S b A b a [S,{$}] z First 2 ( bAba$ ) = { bb } L 1 = First 2 ( ba$ ) = { ba } bb This means [S,{$}] b[A,{ba}]ba 6. Consider A b [A,{a$}] z First 2 ( ba$ ) = { ba } No L 1 ba This means [A,{a$}] b YANG YANG 44 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing 7. Consider A b [A,{ba}] z First 2 ( bba ) = { bb } No L 1 bb This means [A,{ba}] b 8. Consider A [A,{a$}] z First 2 ( a$ ) = { a$ } No L 1 a$ This means [A,{a$]] 9. Consider A [A,{ba}] z First 2 ( ba ) = { ba } No L 1 ba This means [A,{ba}] YANG YANG 45 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing In summary, we have 7 productions: ab,aa,bb [G,{}] [S,{$}]$ ab,aa [S,{$}] a[A,{a$}]a bb [S,{$}] b[A,{ba}]ba ba [A,{a$}] b a$ [A,{a$}] bb [A,{ba}] b ba [A,{ba}] Note that there is no conflict on the look- aheads. Therefore, the grammar is LL(2). YANG YANG 46 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Now lets parse the string abba$ [G,{}] [S,{$}] $ a [A,{a$}] a $ match a fail Parse bbba$ [G,{}] [S,{$}] $ b [A,{ba}] b a $ match 1st b b b b a $ match 2nd b match 3rd b match a Do you DARE to try exercise 11 on page 139? YANG YANG 47 Chap 5 LL(1) Parsing Chap 5 LL(1) Parsing Some results: LL(k) LL(k+1) strong LL(k) strong LL(k+1) strong LL(k) LL(k) for all k>1 strong LL(1) = LL(1) L k = { a n (b,b k d)n | n u 1 } needs k-token lookahead. Strong LL(1)s table is larger. error detection