This action might not be possible to undo. Are you sure you want to continue?

# Practical Application: Parsing

1

13. Parsing

Parsing is one of the major functions of the compiler of a programming language. Given a source code w, the parser examines w to see whether it can be derived by the grammar of the programming language, and, if it can be, the parser constructs a parse tree yielding w. Based on this parse tree, the compiler generates an object code. So, the parser acts as a membership test algorithm designed for a given grammar G that, given a string w, tells us whether w is in L(G) or not, and, if it is, outputs a parse tree. Notice that the parser tests the membership based on the given grammar. Recall that when we practiced constructing a PDA for a given language, say {aibi | i > 0 }, we used the structural information of the language, such as a’s come first, then b’s, and the number of a’s and b’s are same. Consider the two CFG’s G1 and G2 shown below in figure (a), which generate the same language {aibi | i > 0 }. Figure (b) shows a PDA that recognizes this language. For an input string w, this PDA does not give any information about the grammar and how the string w is derived. Hence, we need a different approach to construct a parser based on the grammar, not the language. There are several algorithms available for parsing that, given an arbitrary CFG G and a string x, tell whether x ∈ L(G) or not, and if it is, output how x is derived. (CYK algorithm is a typical example, which is shown in Appendix F.) However, these algorithms are too slow to be practical. (For example, CYK algorithm takes O(n3) time for an input string of length n) Thus, we restrict CFG’s to a subclass for which we can build a fast practical parser. This chapter presents two parsing strategies applicable to such restricted grammars together with several design examples. Finally, the chapter briefly introduces Lex (the lexical analyzer generator) and YACC (the parser generator).

( a, a/aa )

(b, a/ε ) (b, a/ ε )

G1: S → aSb | ab G2: S → aA A → Sb | b (a)

(a, Z0/aZ0 )

start

(b)

(ε , Z0/Z0 )

2

Parsing

13.1 Derivation 354 Leftmost derivation, Rightmost-derivation Derivations and parse trees 13.2 LL(k) parsing strategy 357 13.3 Designing an LL(k) parser 367 Examples Definition of LL(k) grammars 13.4 LR(k) parsing strategy 379 13.5 Designing LR(k) parsers 387 Examples Definition of LR(k) grammars 13.6 Lex and YACC 404 Rumination 409 Exercises 412

Memories

Break Time

Two very elderly ladies were enjoying the sunshine on a park bench in Miami. They had been meeting at the park every sunny day, for over 12 years, chatting, and enjoying each others friendship. One day, the younger of the two ladies, turns to the other and says, “Please don't be angry with me dear, but I am embarrassed, after all these years... What is your name ? I am trying to remember, but I just can't.” The older friend stares at her, looking very distressed, says nothing for 2 full minutes, and finally with tearful eyes, says, “How soon do you have to know ?” - overheard by Rubin -

3

13.1 Derivation

Parsing

The parser of a grammar generates a parse tree for a given input string. For convenience, the tree is commonly presented in a sequence of rules applied in one of the following two ways to derive the input string starting with S.

• Leftmost

derivation: A rule is applied with the leftmost nonterminal symbol in the current sentential form. • Rightmost derivation: A rule is applied with the rightmost nonterminal symbol in the current sentential form. Example: G: S → ABC A → aa B → a C → cC | c Leftmost derivation: S ⇒ ABC ⇒ aaBC ⇒ aaaC ⇒ aaacC ⇒ aaacc Rightmost derivation: S ⇒ ABC ⇒ ABcC ⇒ ABcc ⇒ Aacc ⇒ aaacc

4

Derivation

Parsing

The derivation sequences, either leftmost or rightmost, are more convenient to deal with than the tree data structure. However, to generate an object code there must be a simple way to translate the derivation sequence into its unique parse tree. The following two observations shows how it can be done.

Observation 1: The sequence of rules applied according to the leftmost

derivation corresponds to the order of the nodes visited, when you traverse the parse tree top-down (i.e., breadth first), left-to-right. (See the following example.) 1S

G: S → ABC C → cC | c A → aa D → bd B → bD 2

A

3

B

C5 C6 c

Leftmost derivation: 1 2 3 S ⇒ ABC ⇒ aaBC ⇒ aabDC 4 5 6 ⇒ aabbdC ⇒ aabbdcC ⇒ aabbdcc

aa b 4 D c

123456

bd

5

Derivation

Parsing

Observation 2: The reverse order of the rules applied according to the

rightmost derivation corresponds to the nodes visited, when you traverse the parse tree bottom-up, left-to-right. (See the following example.)

G: S → ABC C → cC | c A → aa D → bd B → bD 6A

S1

4B

C2 C3 c

**Rightmost derivation: S ⇒ ABC ⇒ ABcC ⇒ ABcc
**

4 1 2 3 5 6

aa

b

5D c

bd

654321

⇒ AbDcc ⇒ Abbdcc

⇒ aabbdcc

6

Parsing

**13.2 LL(k) parsing strategy
**

We know that parsers are different from PDA’s, because their membership test should be based on the given CFG. Let’s try to build a conventional DPDA which, with the grammar G stored in the finite control, tests whether the input string x is in L(G), and, if it is, outputs a sequence of rules applied to derive x. We equip the finite control with an output port for the output (see figure (b) below). Our first strategy is to derive the same input string x in the stack. Because any string must be derived starting with the start symbol, we let the machine push S into the stack and enter state q1 for the next move. For convenience, we assign a rule number to each rule as shown in figure (a). (1) (2) (3) A → aaaaaaaaaa (6) (7) C → cC | c output port G: S → AB | AC (4) (5) B → bB | b aaaaaaaaaabbb q1 G SZ0 S →?

L(G) = {a10 x | x = bi or x = ci, i ≥ 1 }

(a)

(b)

7

LL(k) Parsing

Parsing

Now, we ask which rule, either (1) or (2), the machine should apply with S to eventually derive the string on the input tape. If the input string is derived using rule (1) (rule (2)) first, then there should be the symbol b (respectively, symbol c) after the 10-th a. Unfortunately, our conventional DPDA model cannot look-ahead the input before reading it. Recall that conventional DPDA’s decide whether they will read the input or not depending on the stack top symbol. Only after reading the input does the machine knows what it is. Thus, without reading up to the 11-th input symbol, there is no way for the machine in the figure to identify the symbol at that position. (1) (2) (3) A → aaaaaaaaaa (6) (7) C → cC | c output port

aaaaaaaaaabbb q1 G S → ? SZ0 (b)

8

G: S → AB | AC (4) (5) B → bB | b

L(G) = {a10 x | x = bi or x = ci, i ≥ 1 }

(a)

LL(k) Parsing

Parsing

To overcome this problem, we equip the finite state control with a “telescope” with which the machine can look some finite k cells ahead on the input tape. For the grammar G, it is enough to have a telescope with the range of 11 cells. (Notice that for the range to look ahead, we also include the cell under the head.) With this new capability, the machine scans the input string ahead in the range, and, based on what it sees ahead, it takes the next move. While looking ahead, the input head does not move.

(1)

(2)

(3) A → aaaaaaaaaa (6) (7) C → cC | c

A N I

G: S → AB | AC (4) (5) B → bB | b

aaaaaaaaaabbb q1 G SZ0

(b)

9

S → AB !

L(G) = {a10 x | x = bi or x = ci, i ≥ 1 } (a)

LL(k) Parsing

Parsing

Now, the parser, looking ahead 11 cells, sees aaaaaaaaaab. Since there is b at the end, the machine chooses rule (1) (i.e., S → AB), rewrites the stack top S with AB and outputs rule number (1) as shown in figure (a). Let q, α , and β be, respectively, the current state, the remaining input portion to read, and the current stack contents. From now on, for convenience we shall use the triple (q, α , β ), called the configuration, instead of drawing the cumbersome diagram to show the parser. (1) (2) G: S → AB | AC (3) A → aaaaaaaaaa (4) (5) B → bB | b (6) (7) C → cC | c α q A B Z0 (a) Apply rule S → AB β (b) Configuration (q, α , β )

G

aaaaaaaaaabbb

(1)

q1 G

10

LL(k) Parsing

(1) (2) (3) (4) (5) (6)

Parsing

(7)

G:

S → AB | AC

A → aaaaaaaaaa

B → bB | b

C → cC | c

Looking ahead 11 cells in the current configuration (q0, aaaaaaaaaabbb, SZ0), the parser applies rule (1) by rewriting the stack top S with the rule’s right side AB. Consequently, the configuration changes as follows. look-ahead 11 cells (q0, aaaaaaaaaabbb, Z0) ⇒(q1, aaaaaaaaaabbb, SZ0) ⇒(q1, aaaaaaaaaabbb, ABZ0) Now, with nonterminal symbol A at the stack top, the parser must find a rule to apply. Since A has only one rule, i.e., rule (3), there is no choice. So, the parser applies rule (3), consequently changing the configuration as follows. (q1, aaaaaaaaaabbb, ABZ0) ⇒ (q1, aaaaaaaaaabbb, aaaaaaaaaaBZ0)

(3) (1)

11

LL(k) Parsing

Parsing

(1)

(2)

(3)

(4)

(5)

(6)

(7)

G:

S → AB | AC

A → aaaaaaaaaa

B → bB | b

C → cC | c

Notice that the terminal symbol appearing at the stack top after applying rule (3) corresponds to the leftmost terminal symbol appearing in the leftmost derivation. Thus, the terminal symbol appearing at the stack top must match the next input symbol, if the input string is generated by the grammar. So, the parser, seeing a terminal symbol at the stack top, reads the input and, if they match, pops the stack top. The following sequence of configurations shows how the parser successfully pops all the terminal symbols pushed on the stack top by applying rule (3). (q0, aaaaaaaaaabbb, Z0) ⇒(q1, aaaaaaaaaabbb, SZ0) ⇒(q1, aaaaaaaaaabbb, ABZ0)

(3) (1)

**⇒ (q1, aaaaaaaaaabbb, aaaaaaaaaaBZ0) ⇒ . . .⇒(q1, abbb, aBZ0) ⇒(q1, bbb, BZ0)
**

12

LL(k) Parsing

(1) (2) (3) (4) (5) (6)

Parsing

(7)

G:

S → AB | AC

A → aaaaaaaaaa

B → bB | b

(1)

C → cC | c

(3)

(q0, aaaaaaaaaabbb, Z0) ⇒(q1, aaaaaaaaaabbb, SZ0) ⇒(q1, aaaaaaaaaabbb, ABZ0) ⇒

(q1, aaaaaaaaaabbb, aaaaaaaaaaBZ0) ⇒ . . . .⇒(q1, abbb, aBZ0) ⇒(q1, bbb, BZ0) ⇒ ? Now, the parser must choose one of B’s rules, either (4) or (5). If there remains only one b in the input tape, rule (5) is the choice. Otherwise (i.e., if there are more than one b), rule (4) must be applied. It follows that the parser needs to look two cells ahead and proceeds as follows. Look-ahead 2 cells (q1, bbb, BZ0) ⇒ (q1, bbb, bBZ0) ⇒ (q1, bb, BZ0) ⇒ (q1, bb, bBZ0) ⇒ (q1, b, BZ0) ⇒ (q1, b, bZ0) ⇒ (q1, ε , Z0)

13 (5) (4) (4)

LL(k) Parsing

Parsing

In summary, our parser works as follows, where underlined parts of the input string are look-ahead contents and the numbers are the rules in the order applied during the parsing.

(1)

(q0, aaaaaaaaaabbb, Z0)⇒(q1, aaaaaaaaaabbb, SZ0)⇒(q1, aaaaaaaaaabbb, ABZ0) ⇒ (q1, aaaaaaaaaabbb, aaaaaaaaaaBZ0)⇒ . . . .⇒(q1, abbb, aBZ0)⇒ (q1, bbb, BZ0) ⇒ (q1, bbb, bBZ0) ⇒ (q1, bb, BZ0) ⇒ (q1, bb, bBZ0) ⇒ (q1, b, BZ0) ⇒ (q1, b, bZ0) ⇒ (q1, ε , Z0)

(5) (4) (4)

(3)

Notice that the last configuration above implies a successful parsing. It shows that the sequence of rules applied on the stack generates exactly the same string as the one originally written on the input tape. If the parser fails to reach the accepting configuration, we say the input is rejected. In the above example, the sequence of rules applied to the nonterminal symbols appearing at the stack top matches the sequence of rules applied for the leftmost derivation of the input string shown below. S ⇒ AB ⇒ aaaaaaaaaaB ⇒ aaaaaaaaaabB ⇒ aaaaaaaaaabbB ⇒ aaaaaaaaaabbb

14 (1) (3) (4) (4) (5)

LL(k) Parsing

(1) (2) (3) (4) (5) (6) (7)

Parsing

G:

S → AB | AC

A → aaaaaaaaaa

B → bB | b

C → cC | c

For the other input strings ending with c’s, the parser can apply the same strategy and successfully parse it by looking ahead at most 11 cells (see below). This parser is called an LL(11) parser, named after the following property of the parser; the input is read Left-to-right, the order of rules applied matches the order of the Leftmost derivation, and the longest look-ahead range is 11 cells. For a grammar G, if we can build an LL(k) parser, for some constant k, we call G an LL(k) grammar. (q0, aaaaaaaaaabbb, Z0) ⇒(q1, aaaaaaaaaaccc, SZ0) ⇒(q1, aaaaaaaaaaccc, ACZ0) ⇒ (q1, aaaaaaaaaaccc, aaaaaaaaaaCZ0) ⇒ . . . .⇒(q1, abbb, aCZ0) ⇒ (q1, ccc, CZ0) ⇒ (q1, ccc, bBZ0) ⇒ (q1, cc, CZ0) ⇒ (q1, cc, cCZ0) ⇒ (q1, c, CZ0) ⇒ (q1, c, cZ0) ⇒ (q1, ε , Z0)

15 (7) (6) (6) (2) (3)

LL(k) Parsing

Parsing

Formally, an LL(k) parser is defined by a parse table with the nonterminal symbols on the rows and the look-ahead contents on the columns. The table entries are the right sides of the rules applied. Blank entries are for the rejecting cases. The parse table below is constructed based on our observations, while analyzing how the parser should work for the given input string. In the look-ahead contents, X is a don’t-care symbol, and ε means no look-ahead is needed.

(1) (2) (3) (4) (5) (6) (7)

G:

S → AB | AC a10 b AB

A → aaaaaaaaaa

B → bB | b

C → cC | c cB10 ε a10

Stack top S A B C

Contents of 11 look-ahead a10 c bbX9 bB10 ccX9 AC bB b cC Parse Table

c

16

Parsing

**13.3 Designing an LL(k) Parser
**

Example 1. Design an LL(k) parser with minimum k for the following CFG. S → aSb | aabbb The language of this grammar is {aiaabbbbi | i ≥ 0}. Every string generated by this grammar has aabbb at the center. As we did in the preceding section, let’s examine how an LL(k) parser will parse the input aaaabbbbb with the shortest possible look-ahead range of k. To parse the input string successfully, the machine should apply the rules in the order of (1), (1), (1), (2), which is the same order applied for the following leftmost derivation. S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaaaabbbbbb

(1) (1) (1) (2) (1) (2)

17

Designing LL(k) Parser

Parsing

Pushing the start symbol S into the stack in the initial configuration, the parser gets ready to parse the string as shown below. With S in the stack top, it must apply one of S’s two rules. To choose one of them, the parser needs to look ahead for supporting information. What could be the shortest range to look ahead? S → aSb | aabbb

(1) (2)

(q0, aaaaabbbbbb, Z0) ⇒ (q1, aaaaabbbbbb, SZ0) ⇒ ?

If there is aabbb, rule (2) must be applied. So it appears k = 5. But the parser does not have to see the whole string. If there is aaa ahead, the leftmost symbol a must have been generated by rule (1). Otherwise, if there is aab ahead, the leftmost a must have been generated by rule (2). It is enough to look ahead 3 cells (i.e., k = 3). Thus, in the current configuration, since the contents of 3 look-ahead is aaa, the parser applies rule (1), then reads the input to match and pop the terminal symbol a from the stack top as follows. (q1, aaaaabbbbbb, SZ0) ⇒(q1, aaaaabbbbbb, aSbZ0) ⇒(q1, aaaabbbbbbb, SbZ0) Look-ahead 3

18

(1)

Designing LL(k) Parser

Parsing

Again, with S on the stack top, the parser looks ahead 3 cells, and seeing aaa, applies rule (1), and repeats the same procedure until it looks ahead aab as follows. (q1, aaaaabbbbbb, SZ0) ⇒(q1, aaaaabbbbbb, aSbZ0) ⇒ S → aSb | aabbb

(1) (2) (1)

**(q1, aaaabbbbbbb, SbZ0) ⇒ (q1 , aaaabbbbbb, aSbbZ0 ) ⇒ (q1, aaabbbbbbb, SbbZ0) ⇒ (q1 , aaabbbbbb, aSbbbZ0 ) ⇒ (q1 , aabbbbbb, SbbbZ0 ) ⇒?
**

(1)

(1)

Now, the parser finally applies rule (2), and keeps reading and match-andpopping until it enters the accepting configuration as follows. (2) (q1 , aabbbbbb, SbbbZ0 ) ⇒ (q1 , aabbbbbb, aabbbbbbZ0 ) ⇒ … ⇒ (q1 , ε , Z0)

19

Designing LL(k) Parser

Parsing

The parser applied the rules in the order, (1), (1), (1), (2), which is the same order applied for the leftmost derivation of the input string aaaaabbbbbb. S ⇒ aSb ⇒ aaSbb ⇒ aaaSbbb ⇒ aaaaabbbbbb Given an arbitrary input string, the parser, applying the same procedure, will end up in the final accepting configuration if and only if the input belongs to the language of the grammar. The parser needs to look ahead at least 3 cells. Hence, the grammar is LL(3). The parse table is shown below. 3 look-ahead aaa aab aSb aabbb

(1) (1) (1) (2)

S → aSb | aabbb

(1)

(2)

Stack top S

Parse Table

20

Designing LL(k) Parser

Parsing

**Example 2. Construct an LL(k) parser with minimum k for the following CFG.
**

(1) (2) (3) (4)

S → abA | ε

A → Saa | b

As we did for Example1, we pick up a typical string, ababaaaa, derivable by the grammar, and examine how it can be parsed according to the LL(k) parsing strategy with minimum k. Then, based on the analysis, we will construct a parse table. The order of the rules applied by the parser should be the same as the one applied in the following leftmost derivation. S ⇒ abA ⇒ abSaa ⇒ ababAaa ⇒ ababSaaaa ⇒ ababaaaa Pushing the start symbol S on the top of the stack, the parser must choose either rule (1) or (2) that will lead to finally deriving the input string. For the choice, is there any useful information ahead on the input tape? (q0, ababaaaa, Z0) ⇒ (q1, ababaaaa, SZ0) ⇒ ?

21

(1)

(3)

(1)

(3)

(2)

Designing LL(k) Parser

(1) (2) (3) (4)

Parsing A → Saa | b

S → abA | ε

If the input is not empty, the parser, with S at the stack top, should choose rule (1) to apply. Then, as shown below, for each terminal symbol appearing at the stack top, the parser reads the next input symbol, and if they match, pops out the stack top until A appears. If the input tape was empty, the parser would simply pops S (i.e., rewrites S with ε ) and enters the accepting configuration. Now, with A at the stack top, the parser should choose a rule between (3) and (4). (1) (q1, ababaaaa, SZ0) ⇒ (q1, ababaaaa, abAZ0) ⇒ . . ⇒ (q1, abaaaa, AZ0) ⇒? If rule (4) was used to derive the input, the next input symbol ahead should be b, not a. Looking symbol a ahead, the parser applies rule (3), and consequently, having S on the stack top as before, it needs to look ahead to choose the next rule. Up to this point, it appears that 1 look-ahead is an appropriate range. (q1, abaaaa, AZ0) ⇒ (q1, abaaaa, SaaZ0) ⇒ ?

22

(3)

Designing LL(k) Parser

Parsing

But this time, with S at the stack top it is uncertain which rule to apply. Looking a ahead, the parser can apply either rule (1) or rule (2), because in either case, the parser will successfully match the stack top a with the next input symbol a (see below). To resolve this uncertainty, the parser needs one more cell to look ahead. To solve this problem we could have the parser look down the stack. But we have chosen to extend the range of look-ahead, a straightforward solution. Later in this chapter, we will discuss parsers which are allowed to look down the stack some finite depth. (1) (q1 , abaaaa, abaaZ0) ⇒ (1) (2) (3) (4) (q1 , abaaaa, SaaZ0) ⇒ S → abA | ε A → Saa | (2) (q , abaaaa, aaZ ) 1 0 b Now, looking ab ahead in the extended range, which must be generated by rule (1), the parser applies the rule and repeats the previous procedure as follows till S appears at the stack top again. (q1 , abaaaa, SaaZ0 ) ⇒ (q1 , abaaaa, abAaaZ0 ) ⇒. . ⇒ (q1, aaaa, AaaZ0) ⇒ (q1, aaaa, SaaaaZ0) ⇒?

23

(1)

(3)

Designing LL(k) Parser

(1) (2) (3) (4)

Parsing

S → abA | ε

A → Saa | b

Looking aa ahead with S on the stack top, the parser applies rule (2). Then, for each a appearing at the stack top, it keeps reading the next input symbol, matching them and popping the stack top, eventually entering the accepting configuration. (q1, aaaa, SaaaaZ0) ⇒ (q1, aaaa, aaaaZ0) ⇒ . . . . ⇒ (q1, ε , Z0) In summary, the parser parses the input string ababaaaa as follows. (q1, ababaaaa, SZ0) ⇒ (q1, ababaaaa, abAZ0) ⇒ . . ⇒ (q1, abaaaa, AZ0) ⇒

(1) (2) (1) (3) (2)

(q1, abaaaa, SaaZ0) ⇒ (q1 , abaaaa, abAaaZ0 ) ⇒. . ⇒ (q1, aaaa, AaaZ0) ⇒ (q1, aaaa, SaaaaZ0) ⇒ (q1, aaaa, aaaaZ0) ⇒ . . . . (q1, ε , Z0)

(3)

24

Designing LL(k) Parser

Parsing

The input string that we have just examined is the one derived by applying rule (2) last. For the other typical string ababbaa that can be derived by applying rule (4) last, the LL(2) parser will parse it as follows. (q1, ababbaa, SZ0) ⇒ (q1, ababbaa, abAZ0) ⇒ . . ⇒ (q1, abbaa, AZ0) ⇒

(1) (3) (1) (3)

(q1, abbaa, SaaZ0) ⇒ (q1 , abbaa, abAaaZ0 ) ⇒. . ⇒ (q1, baa, AaaZ0) ⇒ (q1, baa, baaZ0) ⇒ . . . . ⇒ (q1, ε , Z0) From the analysis with the two parsing examples, we construct the following parse table. (Notice that with A at the stack top, though 1 look-ahead is enough, the entries are under the column of 2 look-ahead.) 2 look-ahead ab aa bX BB Stack (1) (2) (3) (4) ε S → abA | ε A → Saa | top S abA ε B: blank b A Saa Saa b X: don’t care Parse Table

25

Designing LL(k) Parser

Parsing

For a given input string, the basic strategy of LL(k) parsing is to generate the same string on the top of the stack by rewriting every nonterminal symbol appearing at the stack top with the right side of that nonterminal’s rule. If the nonterminal symbol has more than one rule, the parser picks the right one based on the prefix of the input string appearing on k cells looked ahead. Whenever a terminal symbol appears on the stack top, the machine reads the next input symbol and pops the stack top, if they match. The sequence of rules applied for a successful parsing according this strategy is the same as the one applied for the leftmost derivation of the input string. The class of CFG’s that can be parsed by LL(k) parsing strategy is limited. The CFG G1 below is an example for which no LL(k) parser exists. However, G2, which generates the same language, is an LL(k) grammar. We will shortly explain why. G1: S → A | B G2: S → aS | D A → aA | 0 D→ 0|1 B → aB | 1

L(G1) = L(G2) = {ait | i ≥ 0, t ∈ {0, 1}}

26

Designing LL(k) Parser

Parsing

Consider the first working configuration illustrated below (with the start symbol S on top of the stack.) The parser should choose one of S’s two rules, S→A and S →B. But it is impossible to choose a correct rule, because the right end symbol 0 (or 1), which is essential for the correct choice, can be located arbitrarily far to the right. It is impossible for any LL(k) parser to identify it ahead with its “telescope” of a finite range k. But for the grammar G2, we can easily design an LL(1) parser.

aaaa..... aa0 G1: S → A | B G2: S → aS | D A → aA | 0 D→ 0|1 B → aB | 1 q1

G1

L(G1) = L(G2) = {ait | i ≥ 0, t ∈ {0, 1}}

SZ0 S → ?

27

Definition of LL(k) grammars

Parsing

We saw just now two CFG’s that generate the same language, but for the one, no LL(k) parser exists, and for the other, we can design an LL(k) parser. So, we may ask the following: What is the property of LL(k) grammars? For a string x, let (k) x denote the prefix of length k of string x. If | x | < k, then (k ) x = x. For example, (2) ababaa = ab, (3) ab = ab. Definition (LL(k) grammar). Let G = (VT, VN, P, S) be a CFG. Grammar G is an LL(k) grammar if it satisfies the following condition. Consider two arbitrary leftmost derivations of the following forms. S ⇒* ω Aα ⇒ ω β α ⇒* ω y S ⇒* ω Aα ⇒ ω γ α ⇒* ω x , where α , β , γ ∈(VT ∪VN)*, ω , x, y ∈ VT*, A ∈VN. If (k) x = (k) y , then it must be that β = γ . That is, in the above two derivations, the same rule of A should have been applied if (k) x = (k) y. The above condition implies that with a nonterminal symbol A on the stack top, the parser can identify A’s rule to apply by looking ahead k cells. If G has such property, we can build an LL(k) parser.

28