You are on page 1of 47

' $

J. Xue

Lecture 4: Top-Down Parsing: Recursive-Descent


1. Compare and contrast top-down and bottom-up parsing
2. Write a predictive (or non-backtracking) top-down parser
Grammar G

Eliminating left recursion & common prefixes

The Transformed Grammar G′

Constructing First, Follow and Select Sets for G′

A Recursive-Descent Parser The LL(1) Parsing Table

The LL(1) Table-Driving Parser


& %
COMP3131/9102 Page 196 March 22, 2010
' $
J. Xue

The micro-English Grammar Revisited


1 hsentencei → hsubjecti hpredicatei
2 hsubjecti → NOUN
3 | ARTICLE NOUN
4 hpredicatei → VERB hobjecti
5 hobjecti → NOUN
6 | ARTICLE NOUN

The English Sentence

PETER PASSED THE TEST

& %
COMP3131/9102 Page 197 March 22, 2010
' $
J. Xue

The micro-English Grammar Revisited (Cont’d)


• The Leftmost Derivation:
hsentencei =⇒lm hsubjecti hpredicatei by P1
=⇒lm NOUN hpredicatei by P2
=⇒lm NOUN VERB hobjecti by P4
=⇒lm NOUN VERB ARTICLE NOUN by P6

• The Rightmost Derivation:


hsentencei =⇒rm hsubjecti hpredicatei by P1
=⇒rm hsubjecti VERB hobjecti by P4
=⇒rm hsubjecti VERB ARTICLE NOUN by P6
=⇒rm NOUN VERB ARTICLE NOUN by P2

& %
COMP3131/9102 Page 198 March 22, 2010
' $
J. Xue

The Role of the Parser

PETER PASSED THE TEST

Scanner
NOUN1 VERB ARTICLE NOUN2

Parser
hsentencei
hsubjecti hpredicatei
hNOUNi VERB hobjecti
PETER PASSED ARTICLE NOUN
THE TEST

& %
COMP3131/9102 Page 199 March 22, 2010
' $
J. Xue

Two General Parsing Methods


1. Top-down parsing – Build the parse tree top-down:
• Productions used represent the leftmost derivation.
• The best known and widely used methods:
– Recursive descent
– Table-driven
– LL(k) (Left-to-right scan of input, Leftmost derivation, k tokens of
lookahead).
– Almost all programming languages can be specified by LL(1) grammars, but
such grammars may not reflect the structure of a language
– In practice, LL(k) for small k is used
• Implemented more easily by hand.
• Used in parser generators such as JavaCC
2. Bottom-up parsing – Build the parse tree bottom-up:
• Productions used represent the rightmost derivation in reverse.
• The best known and widely used method: LR(1) (Left-to- right scan of input,
Rightmost derivation in reverse, 1 token of lookahead)
• More powerful – every LL(1) is LR(1) but the converse is false
• Used by parser generators (e.g., Yacc and JavaCUP).
& %
COMP3131/9102 Page 200 March 22, 2010
' $
J. Xue

Lookahead Token(s)

• Lookahead Token(s): The currently scanned token(s) in the input.


• In Recogniser.java, currentToken represents the lookahead token
• For most programming languages, one token lookahead only.
• Initially, the lookahead token is the leftmost token in the input.

& %
COMP3131/9102 Page 201 March 22, 2010
' $
J. Xue

Top-Down Parse Of NOUN VERB ARTICLE NOUN


T REE hsentencei

I NPUT NOUN

T REE hsentencei
hsubjecti hpredicatei

I NPUT NOUN

Notations:
• ↑ on the tree indicates the nonterminal being expanded or recognised
• ↑ on the sentence points to the lookahead token
– All tokens to the left of ↑ have been read
– All tokens to the right of ↑ have NOT been processed

& %
COMP3131/9102 Page 202 March 22, 2010
' $
J. Xue

T REE hsentencei

hsubjecti hpredicatei

NOUN

I NPUT NOUN

T REE hsentencei

hsubjecti hpredicatei

NOUN
I NPUT NOUN VERB

& %
COMP3131/9102 Page 203 March 22, 2010
' $
J. Xue

T REE hsentencei

hsubjecti hpredicatei

NOUN VERB hobjecti



I NPUT NOUN VERB

T REE hsentencei

hsubjecti hpredicatei

NOUN VERB hobjecti



I NPUT NOUN VERB ARTICLE

& %
COMP3131/9102 Page 204 March 22, 2010
' $
J. Xue

T REE hsentencei

hsubjecti hpredicatei
NOUN VERB hobjecti

ARTICLE NOUN

I NPUT NOUN VERB ARTICLE

T REE hsentencei

hsubjecti hpredicatei
NOUN VERB hobjecti

ARTICLE NOUN

I NPUT NOUN VERB ARTICLE NOUN

& %
COMP3131/9102 Page 205 March 22, 2010
' $
J. Xue

Top-Down Parsing

• Build the parse tree starting with the start symbol (i.e., the
root) towards the sentence being analysed (i.e., leaves).
• Use one token of lookahead, in general
• Discover the leftmost derivation
I.e, the productions used in expanding the parse tree
represent a leftmost derivation

& %
COMP3131/9102 Page 206 March 22, 2010
' $
J. Xue

Predictive (Non-Backtracking) Top-Down Parsing


• To expand a nonterminal, the parser always predict
(choose) the right alternative for the nonterminal by
looking at the lookahead symbol only.
• Flow-of-control constructs, with their distinguishing
keywords, are detectable this way, e.g., in the VC
grammar:
hstmti → hcompound-stmti
| if ”(” hexpri ”)” (ELSE hstmti)?
| break ”;”
| continue ”;”
···
• Prediction happens before the actual match begins.
& %
COMP3131/9102 Page 207 March 22, 2010
' $
J. Xue

Bottom-Up Parse Of NOUN VERB ARTICLE NOUN

T REE
I NPUT NOUN

T REE hsubjecti
NOUN
I NPUT NOUN

T REE hsubjecti
NOUN
I NPUT NOUN VERB

& %
COMP3131/9102 Page 208 March 22, 2010
' $
J. Xue

T REE hsubjecti
NOUN
I NPUT NOUN VERB ARTICLE

T REE hsubjecti
NOUN
I NPUT NOUN VERB ARTICLE NOUN

T REE hsubjecti hobjecti


NOUN ARTICLE NOUN
I NPUT NOUN VERB ARTICLE NOUN

Note: What if the parser had chosen hsubjecti →ARTICLE NOUN instead of hobjecti →ARTICLE
NOUN?
In this case, the parser would not make any further process.
Having read a hsubjecti and VERB, the parser has reached a state in which it should not parse another hsubjecti.

& %
COMP3131/9102 Page 209 March 22, 2010
' $
J. Xue

T REE hsubjecti hpredicatei

NOUN VERB hobjecti

ARTICLE NOUN
I NPUT NOUN VERB ARTICLE NOUN

T REE hsentencei

hsubjecti hpredicatei

NOUN VERB hobjecti

ARTICLE NOUN
I NPUT NOUN VERB ARTICLE NOUN

& %
COMP3131/9102 Page 210 March 22, 2010
' $
J. Xue

Bottom-Up Parsing

• Build the parse tree starting with the the sentence being
analysed (i.e., leaves) towards the start symbol (i.e., the
root).
• Use one token of lookahead, in general.
• The basic (smallest) language constructs recognised first,
then they are used to discover more complex constructs.
• Discover the rightmost derivation in reverse — the
productions used in expanding the parse tree represent a
rightmost derivation in reverse order

& %
COMP3131/9102 Page 211 March 22, 2010
' $
J. Xue

Lecture 4: Top-Down Parsing: Recursive-Descent



1. Compare and contrast top-down and bottom-up parsing
2. Write a predictive (or non-backtracking) top-down parser
Grammar G Assignment 2!

Eliminating left recursion & common prefixes

The Transformed Grammar G′

Constructing First, Follow and Select Sets for G′

a Recursive-Descent Parser The LL(1) Parsing Table

The LL(1) Table-Driving Parser


& %
COMP3131/9102 Page 212 March 22, 2010
' $
J. Xue

Which of the Two Alternatives on S to Choose?


• Grammar:
S → aA | bB
A → ···
B → ···
• Sentence: a · · ·
• The leftmost derivation:
S =⇒lm aA
=⇒lm · · ·

Select the first alternative aA

& %
COMP3131/9102 Page 213 March 22, 2010
' $
J. Xue

Which of the Two Alternatives on S to Choose?


• Grammar:
S → Ab | Bc
A → Df | CA
B → gA | e
C → dC | c
D→h|i
• Sentence: gchf c
• The leftmost derivation:
S =⇒lm Bc =⇒lm gAc =⇒lm gCAc
=⇒lm gcAc =⇒lm gcDf c =⇒lm gchf c

& %
COMP3131/9102 Page 214 March 22, 2010
' $
J. Xue

Intuition behind First Sets


• Grammar:
S → Ab | Bc
A → Df | CA
B → gA | e
C → dC | c
D→h|i
=⇒hf b
=⇒Df b =⇒if b
=⇒Ab
=⇒dCAb=⇒ · · ·
=⇒CAb
S =⇒cAb=⇒ · · ·
=⇒gAc=⇒ · · ·
=⇒Bc
=⇒ec=⇒ · · ·
• All possible leftmost derivations:
First(Ab) = {c, d, h, i} First(Bc) = {e, g}
& %
COMP3131/9102 Page 215 March 22, 2010
' $
J. Xue

Definition of First Sets

First(α):
• The set of all terminals that can begin any strings derived
from α.
• if α=⇒∗ ǫ, then ǫ is also in First(α)

Nullable Nonterminals

A nonterminal A is nullable if A=⇒∗ ǫ.

& %
COMP3131/9102 Page 216 March 22, 2010
' $
J. Xue

A Procedure to Compute First(α)


1. Case 1: α is a single symbol or ǫ:
If α is a terminal a, then First(α) = First(a) = {a}
else if α is ǫ, then First(α) = First(ǫ) = {ǫ}
else if α is a nonterminal and α→β1 | β2 | β3 | · · · then
First(α) = ∪k First(βk )
2. Case 2: α = X1 X2 · · · Xn :
If X1 X2 . . . Xi is nullable but Xi+1 is not, then
First(α) = First(X1 ) ∪ First(X2 ) ∪ · · · ∪ First(Xi+1 )
If i = n is nullable, then add ǫ to First(α)

& %
COMP3131/9102 Page 217 March 22, 2010
J. Xue

' $

Case 2 of the Procedure for Computing First


S → ABCd
A→e|f |ǫ
B→g|h|ǫ
C →p|q

eBCd=⇒
S=⇒ ABCd f BCd=⇒ gCd=⇒
BCd=⇒
hCd=⇒ pd
Cd=⇒
qd
First(ABCd) = {e, f, g, h, p, q}
& %
COMP3131/9102 Page 218 March 22, 2010
J. Xue

' $

Case 2 of the Procedure for Computing First Again


S → ABC d was deleted from the grammar in slide 218
A→e|f |ǫ
B→g|h|ǫ
C →p|q|ǫ

eBC=⇒
S=⇒ ABC f BC=⇒ gC=⇒
BC=⇒
hC=⇒ p
C=⇒ q
ǫ
First(ABC) = {e, f, g, h, p, q, ǫ}
& %
COMP3131/9102 Page 219 March 22, 2010
' $
J. Xue

The Expression Grammar


• The grammar with left recursion:
Grammar 1: E → E + T | E − T | T
T → T ∗ F | T /F | F
F → INT | (E)
• The transformed grammar without left recursion:
Grammar 2: E → T Q
Q → +T Q | − T Q | ǫ
T → FR
R → ∗F R | /F R | ǫ
F → INT | (E)

& %
COMP3131/9102 Page 220 March 22, 2010
' $
J. Xue

First Sets for Grammar 1 (with Recursion)


First(E) = First(E + T ) = First(E − T ) =
First(T ) = First(T ∗ F ) = First(T /F ) =
First(F ) = {(, INT}
First((E)) = {(}
First(INT) = {INT}

The explicit construction is left as an exercise.

& %
COMP3131/9102 Page 221 March 22, 2010
' $
J. Xue

First Sets for Grammar 2 (without Recursion)


First(E) = First(T Q) = {(, INT}
First(T ) = First(F R) = {(, INT}
First(Q) = {+, −, ǫ}
First(R) = {∗, /, ǫ}
First(F ) = {INT, (}
First(+T Q) = {+}
First(−T Q) = {−}
First(∗F R) = {∗}
First(/F R) = {/}
First((E)) = {(}
First(INT) = {INT}

& %
COMP3131/9102 Page 222 March 22, 2010
' $
J. Xue

Why Follow Sets?

• First sets do not tell us when to apply A→α such that


α=⇒∗ ǫ (the important special case is A→ǫ)
• Follow sets do
• Follow sets constructed only for nonterminals
• By convention, assume every input is terminated by a
special end marker (i.e., the EOF marker), denoted $
• Follow sets do not contain ǫ

& %
COMP3131/9102 Page 223 March 22, 2010
' $
J. Xue

Definition of Follow Sets

Let A be a nonterminal. Define Follow(A) to be the set of


terminals that can appear immediately to the right of A in
some sentential form. That is,
Follow(A) = {a | S=⇒∗ · · · Aa · · · }
where S is the start symbol of the grammar.

& %
COMP3131/9102 Page 224 March 22, 2010
' $
J. Xue

A Procedure to Compute Follow Sets

1. If A is the start symbol, add $ to Follow(A).


2. Look through the grammar for all occurrences of A on the
right of productions. Let a typical production be:
B → αAβ
There are two cases – both may be applicable:
(a) Follow(A) includes First(β) − {ǫ}.
(b) If β=⇒∗ ǫ, then include Follow(B) in Follow(A).

& %
COMP3131/9102 Page 225 March 22, 2010
' $
J. Xue

Follow Sets for Grammar 1 (with Recursion)

Follow(E) = {+, −, ), $}
Follow(T ) = Follow(F ) = {+, −, ∗, /, ), $}

The explicit construction is left as an exercise.

& %
COMP3131/9102 Page 226 March 22, 2010
' $
J. Xue

Follow Sets for Grammar 2 (without Recursion)

Follow(E) = {), $}
Follow(Q) = {), $}
Follow(T ) = {+, −, ), $}
Follow(R) = {+, −, ), $}
Follow(F ) = {+, −, ∗, /, ), $}

& %
COMP3131/9102 Page 227 March 22, 2010
' $
J. Xue

Select Sets for Productions

• One Select set for every production in the grammar:


• The Select set for a production of the form A→α is:
– If ǫ ∈ First(α), then
Select(A→α) = (First(α) − {ǫ}) ∪ Follow(A)
– Otherwise:
Select(A→α) = First(α)
• The Select set predicts A→α to be used in a derivation.
• Thus, the Select not needed if A has has one alternative

& %
COMP3131/9102 Page 228 March 22, 2010
' $
J. Xue

Select Sets for Grammar 1

Follow sets not used since the grammar has no ǫ-productions


Select(E→E + T ) = First(E + T ) = {(, INT}
Select(E→E − T ) = First(E − T ) = {(, INT}
Select(E→T ) = First(T ) = {(, INT}
Select(T →T ∗ F ) = First(T ∗ F ) = {(, INT}
Select(T →T /F ) = First(T /F ) = {(, INT}
Select(T →F ) = First(F ) = {(, INT}
Select(F →INT) = First(INT) = {INT}
Select(F →(E)) = First((E)) = {(}

& %
COMP3131/9102 Page 229 March 22, 2010
' $
J. Xue

Select Sets for Grammar 2

Select(E→T Q) = First(T Q) = {(, INT}


Select(Q→ + T Q) = First(+T Q) = {+}
Select(Q→ − T Q) = First(−T Q) = {−}
Select(Q→ǫ) = (First(ǫ) − {ǫ}) ∪ Follow(Q) = {), $}
Select(T →F R) = First(F R) = {(, INT}
Select(R→ ∗ F R) = First(+F R) = {∗}
Select(R→/F R) = First(/F R) = {/}
Select(R→ǫ) = (First(ǫ) − {ǫ}) ∪ Follow(T ) = {+, −, ), $}
Select(F →INT) = First(INT) = {INT}
Select(F →(E)) = First((E)) = {(}

& %
COMP3131/9102 Page 230 March 22, 2010
' $
J. Xue

Outline for the Rest of the Lecture

1. Definition of LL(1) grammar


2. One simplification in the presence of a nullable alternative
3. Eliminate left recursion and common prefixes
4. Write parsing methods in the presence of regular operators
5. LL(k) for small k often necessary for some constructs

& %
COMP3131/9102 Page 240 March 22, 2010
' $
J. Xue

Definition of LL(1) Grammar

• A grammar is LL(1) if for every nonterminal of the form:


A → α1 | · · · | αn
the select sets are pairwise disjoint, i.e.:
Select(A→αi ) ∩ Select(A→αj ) = ∅
for all i and j such that i 6= j.
• This implies there can be at most one nullable alternative

& %
COMP3131/9102 Page 241 March 22, 2010
' $
J. Xue

Left Recursion
• Direct left-recursion:
A → Aα
• Non-direct left-recursion:
A → Bα
B → Aβ
– Algorithm 4.1 of text eliminates both kinds of left recursion
– In real programming languages, non-direct left-recursion is rare
– Not required
• A grammar with left recursion is not LL(1)

& %
COMP3131/9102 Page 246 March 22, 2010
' $
J. Xue

Eliminating Direct Left Recursion


• The grammar G1 :
A → α1 | α2 | · · · | αn // αi does not beging with A
A → Aβ1 | Aβ2 | · · · | Aβm
• The transformed grammar G2 :
A → α1 A′ | α2 A′ | · · · | αn A′
A′ → β1 A′ | β2 A′ | · · · | βm A′ | ǫ
• G1 and G2 define the same language: L(G1 ) = L(G2 )
• Example: in Slide 220, Grammar 2 is the transformed
version of Grammar 1

& %
COMP3131/9102 Page 247 March 22, 2010
' $
J. Xue

Eliminating Direct Left Recursion: Special Case


• The grammar G1 :
A → α // α does not beging with A
A → Aβ
• The transformed grammar G2 :
A → αA′
A′ → βA′ | ǫ

& %
COMP3131/9102 Page 248 March 22, 2010
' $
J. Xue

Eliminating Common Prefixes: Left-Factoring


• The grammar G1 :
A → αβ1 | αβ2 | · · · | αβm
A → γ
• The transformed grammar G2 :
A → αA′
A → γ
A′ → β1 | β2 | · · · | βm
• L(G1 ) = L(G2)
• A grammar with common prefixes is not LL(1)

& %
COMP3131/9102 Page 250 March 22, 2010
' $
J. Xue

Example: Eliminating Common Prefixes

hstmti → IF ”(” hexpri ”)” hstmti ELSE hstmti


hstmti → IF ”(” hexpri ”)” hstmti
hstmti → other

hstmti → IF ”(” hexpri ”)” hstmti helse-cluasei
helse-clausei → ELSE hstmti | ǫ
hstmti → other

& %
COMP3131/9102 Page 251 March 22, 2010
' $
J. Xue

Eliminating Direct Left Recursion Using Regular Operators


• The grammar G1 :
A → α1 | α2 | · · · | αn
A → Aβ1 | Aβ2 | · · · | Aβm
• The transformed grammar G2 :
A → (α1 | · · · | αn )(β1 | · · · | βm )∗
• G1 and G2 define the same language: L(G1 ) = L(G2 )
• Recommended to use in Assignment 2, where n = m = 1
for most left-recursive cases:
A → αβ ∗

& %
COMP3131/9102 Page 252 March 22, 2010
' $
J. Xue

The Expression Grammar


• The grammar with left recursion:
Grammar 1: E → E + T | E − T | T
T → T ∗ F | T /F | F
F → INT | (E)
• Eliminating left recursion using the Kleene Closure
Grammar 3: E → T (”+” T | ”-” T )∗
T → F (”*” F | ”/” F )∗
F → INT | “(” E “)”
All tokens are enclosed in double quotes to distinguish
them for the regular operators: (, ) and ∗
• Compare with Slide 220
& %
COMP3131/9102 Page 253 March 22, 2010
' $
J. Xue

Eliminating Common Prefixes using Choice Operator


• The grammar G1 :
A → αβ1 | αβ2 | · · · | αβm
A → γ
• The transformed grammar G2 :
A → α(β1 | β2 | · · · | βm )
A → γ
• Recommended to use in Assignment 2

& %
COMP3131/9102 Page 254 March 22, 2010
' $
J. Xue

Example: Eliminating Common Prefixes

hstmti → IF ”(” hexpri ”)” hstmti ELSE hstmti


hstmti → IF ”(” hexpri ”)” hstmti
hstmti → other

hstmti → IF ”(” hexpri ”)” hstmti ( ELSE hstmti)?
hstmti → other

Compare with Slide 251

& %
COMP3131/9102 Page 255 March 22, 2010
' $
J. Xue

LL(k) Grammar and Parsing

• A grammar is LL(k) if it can be parsed deterministically


using k tokens of lookahead
• A formal definition for LL(k) grammars can be found in
Grune and Jacobs’ book es
• Grammar 1 in Slide 220 is not LL(k) for any k!
• However, Grammar 2 in Slide 220 is LL(1)
• Only a understanding of LL(1) is required this year ear

& %
COMP3131/9102 Page 257 March 22, 2010

You might also like