Professional Documents
Culture Documents
Solution : The goal of predictive parsing is to construct a top-down parser that never
backtracks. To do so, we must transform a grammar in two ways:
These rules eliminate most common causes for backtracking although they do not guarantee a
completely backtrack-free parsing (called LL (1) as we will see later).
A ::= A a
| b
It recognizes the regular expression ba*. The problem is that if we use the first production for
top-down derivation, we will fall into an infinite derivation chain. This is called left recursion.
But how else can you express ba*? Here is an alternative way:
A ::= b A'
A' ::= a
A' |
Where the third production is an empty production (ie, it is A' ::= ). That is, A' parses the RE
a*. Even though this CFG is recursive, it is not left recursive. In general, for each nonterminal
X, we partition the productions for X into two groups: one that contains the left recursive
productions, and the other with the rest.
X ::= X a1
...
X ::= X an
while the second group is:
X ::= b1
...
X ::=
bm
Where a, b are symbol sequences. Then we eliminate the left recursion by rewriting these
rules into:
X ::= b1 X'
...
X ::= bm X'
X' ::= a1
X'
...
X' ::= an X'
X'
::=
For example, the CFG G1 is transformed into:
E ::= T E'
E' ::= + T E'
| - T E'
|
T ::= F T'
T' ::= * F T'
| / F T'
|
F ::= num
| id
Suppose now that we have a number of productions for X that have a common prefix in their
rhs (but without any left recursion):
X ::= a b1
...
X ::= a bn
X ::= a X'
X' ::= b1
...
X' ::=
bn
This is called left factoring and it helps predict which rule to use without backtracking. For
example, the rule from our right associative grammar G2:
E ::= T + E
| T - E
|
T
is translated into:
E ::= T E'
E' ::= + E
| - E
|
As another example, let L be the language of all regular expressions over the alphabet =
{a, b}. That is, L = {`` ",``a",``b",``a*",``b*",``a| b",``(a| b)",...}. For example, the string
``a(a| b)*| b*" is a member of L. There is no RE that captures the syntax of all REs. Consider
for example the RE (( ... (a) ... )), which is equivalent to the language (na)n for all n. This
represents a valid RE but there is no RE that can capture its syntax.
A context-free grammar that recognizes L is:
R::=RR
| R ``|" R
|R*
|(R)
|a
|b
| `` "
| `` " R'
R' : : = R R'
| ``|" R R'
| * R'
|
THE IDEA. Predictive parsing relies on information about what first symbols can be
generated by the right side of a production. The lookahead symbol guides the selection of the
production A to be used:
• if starts with a token, then the production can be used when the lookahead symbol
matches this token
• if starts with a nonterminal B, then the production can be used if the lookahead
symbol can be generated from B.
THE METHOD.
(1)
(40
)
a (
(VT VN) * ) a
FIRST( )
(41
)
) = .
FIRST( ) FIRST(
• The last statement of Algorithm 2 means that the Aproduction A can be chosen if
A ab | a
Here we have FIRST(cAd) = {c}, FIRST(ab) = {a} and FIRST(a) = {a}.
Unfortunately, if we have lookahead = a we cannot tell each A-production to use. So left
factoring would be needed to process this example further (what we will not do).
Example 16 The following grammar generates a subset of the types in the PASCAL
language.
type
where
FIRST(id) { }
FIRST(array[simple] of type) = { array }
FIRST(integer) = { integer }
FIRST(char) = { char } FIRST(num
dotdot num) = { num }
Algorithms 3 and 4 define the procedures proc_simple and proc_type.
Algorithm 3
Algorithm 4