You are on page 1of 31

COMP

412
FALL 2017

Parsing II
Top-down parsing

Comp 412

source IR … IR target
code code
Front End OpMmizer Back End

Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.
Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these
materials for their personal use.
Faculty from other educaMonal insMtuMons may use these materials for nonprofit educaMonal purposes,
provided this copyright noMce is preserved.

Chapter 3 in EaC2e
Ambiguity Review

Defini9ons
•  A context-free grammar G is ambiguous if there exists has more than
one leTmost derivaMon for some sentence in L(G)
•  A context-free grammar G is ambiguous if there exists has more than
one rightmost derivaMon for some sentence in L(G)
•  The leTmost and rightmost derivaMons for a sentenMal form may differ,
even in an unambiguous grammar
–  However, they must have the same parse tree

COMP 412, Fall 2017 1


Ambiguity Review

We talked about syntac9c ambiguity


•  Ambiguity in the context-free syntax
–  Classic example is the if-then-else grammar

0 Stmt ➝ if Expr then Stmt


1 | if Expr then Stmt else Stmt
2 | … other statements …

–  Fix ambiguity in context-free grammar by re-wriMng the grammar

0 Stmt ➝ if Expr then Stmt


1 | if Expr then WithElse else Stmt
2 | … other statements …
3 WithElse ➝ if Expr then WithElse else WithElse
4 | … other statements …

COMP 412, Fall 2017 3


Ambiguity
We must also deal with seman9c ambiguity
•  One syntax with two meanings
–  Classic example arose in Algol-like languages
A = f(17,21)
Is this a call to a funcMon f? or a reference to an element of an array f?
•  DisambiguaMng this kind of confusion requires context
–  Need the value of the declaraMon for f
–  An issue of type, not syntax
–  Requires either:
1.  An extra-grammaMcal soluMon
➝  Manage the ambiguity by accepBng language and deferring disambiguaBon unBl
the compiler has enough context (e.g., type informaBon)
2.  A different syntax
→  Fix ambiguity in meaning by changing the language (e.g., C’s [ ] or BCPL’s ! )

COMP 412, Fall 2017 4


Order of OperaMons or Precedence
Consider again the deriva9on of x – 2 * y
0 Expr ➝ Expr Op Value Rule Senten*al Form
1 | Value — Expr
2 Value ➝ number 0 Expr Op Value
3 | idenMfier 0 Expr Op Value Op Value
4 Op ➝ plus 1 Value Op Value Op Value
5 | minus 3 <id,x> Op Value Op Value
6 | Mmes 5 <id,x> — Value Op Value
7 I divide 2 <id,x> — <num,2> Op Value
6 <id,x> — <num,2> * Value
3 <id,x> — <num,2> * <id,y>

LeHmost deriva9on

COMP 412, Fall 2017 5


Order of OperaMons
The leHmost deriva9on is unique, but it specifies the wrong precedence
Rule Senten*al Form
Expr
— Expr
0 Expr Op Value
0 Expr Op Value Op Value Expr Op Value
1 Value Op Value Op Value
3 <id,x> Op Value Op Value Mmes <id,y>
5 <id,x> — Value Op Value
Expr Op Value
2 <id,x> — <num,2> Op Value
6 <id,x> — <num,2> * Value
Value minus <num,2>
3 <id,x> — <num,2> * <id,y>

<id,x>
Evaluates (x - 2) * y
Elimina9ng ambiguity does not necessarily produce the desired meaning. It
produces a consistent meaning, but that meaning can be consistently wrong.
COMP 412, Fall 2017 6
Order of OperaMons
How do you add precedence to a grammar?

To add precedence
•  Decide how many levels of precedence the grammar needs
•  Create a nonterminal for each level of precedence
•  Isolate the corresponding part of the grammar
•  Force the parser to recognize high precedence subexpressions first

For algebraic expressions, including (), +, —, *, and /


•  Parentheses first (level 1 )
•  MulMplicaMon and division, next (level 2)
•  SubtracMon and addiMon, last (level 3)

COMP 412, Fall 2017 7


DerivaMons and Precedence
Adding the standard algebraic precedence produces:
0 Goal ➝ Expr The new grammar is larger (7 vs. 9)
1 Expr ➝ Expr + Term •  Takes more rewriMng to reach
level some of the terminal symbols
2 | Expr - Term
3
3 | Term •  Encodes expected precedence
4 Term ➝ Term * Factor •  Produces same parse tree under
level 5 | Term / Factor leTmost & rightmost derivaMons
2
6 | Factor •  Correctness trumps the speed of
7 Factor ➝ ( Expr ) the parser
level
8 | number Let’s see how it parses x + 2 * y
1
9 | id

The “classic expression grammar” Both parentheses & precedence


are beyond the power of an RE
See also Figure 7.7 on p. 351 of EaC2e

COMP 412, Fall 2017 8


DerivaMons and Precedence
Rule Senten*al Form
Goal G

0 Expr
E
2 Expr — Term
3 Term — Term E — T

6 Factor — Term
T T * F
9 <id,x> — Term
4 <id,x> — Term * Factor F F <id,y>
6 <id,x> — Factor * Factor
<id,x> — <num,2> * Factor <id,x> <num,2>
8
9 <id,x> — <num,2> * <id,y>
The le9most deriva*on Parse tree for x – 2 * y
The classic expression grammar derives x — ( 2 * y ) with the parse tree shown..
Both the leTmost and rightmost derivaMons give the same parse tree and value,
because the grammar directly and explicitly encodes the desired precedence.
COMP 412, Fall 2017 9
DerivaMons and Precedence
Rule Senten*al Form
Goal G

0 Expr
E
2 Expr — Term
4 Expr — Term * Factor E — T

9 Expr — Term * <id,y>


T T * F
6 Expr — Factor * <id,y>
8 Expr — <num,2> * <id,y> F F <id,y>
3 Term — <num,2> * <id,y>
Factor — <num,2> * <id,y> <id,x> <num,2>
6
9 <id,x> — <num,2> * <id,y>
The rightmost deriva*on Parse tree for x – 2 * y
The classic expression grammar derives x — ( 2 * y ) with the parse tree shown..
Both the leTmost and rightmost derivaMons give the same parse tree and value,
because the grammar directly and explicitly encodes the desired precedence.
COMP 412, Fall 2017 10
Parsing Techniques
Top-down parsers (LL(1), recursive descent)
•  Start at the root of the parse tree and grow toward leaves
•  Pick a producMon & try to match the input G
•  Bad “pick” ⇒ may need to backtrack
E
•  Large class of grammars are backtrack-free
E - T

Bo[om-up parsers (LR(1), operator precedence) T T * F


•  Start at the leaves and grow toward root
F F <id,y>
•  As input is consumed, encode possibiliMes
in an internal state <id,x> <num,2>
•  Start in a state valid for legal first tokens
Parse tree for x - 2 * y
•  We can make the process determinisMc

Borom-up parsers can recognize a larger class of grammars than can top-down parsers. 11
COMP 412, Fall 2017
Top-Down Parsing
We will examine two ways of implemen9ng top-down parsers

<word, category> pairs Recursive IR


Scanner
Descent Parser

Recursive-Descent Parser
•  Highly efficient, highly flexible form of parser
•  Typically implemented as a hand-coded parser
–  Set of mutually-recursive rouMnes
–  Works well for any “backtrack free” or “predicMvely parsable” grammar
•  Easy to understand, easy to implement

COMP 412, Fall 2017 12


Top-Down Parsing
We will examine two ways of implemen9ng top-down parsers

<word, category> pairs Skeleton IR


Scanner Table
LL Parser

Knowledge encoded in
tables to drive skeleton

specifica9ons (as a CFG) LL(1) Parser


Generator
Lab 2

Table-driven LL(1) Parser


•  LL(1) Parser Generator takes as input a CFG that is backtrack free
•  Skeleton Parser interprets the table produced by the generator
•  In Lab 2, you will implement an LL(1) table generator
–  Your table generator will use a recursive-descent parser as its front end

COMP 412, Fall 2017 13


Top-down Parsing
The Algorithm
•  A top-down parser starts with the root of the parse tree
•  The root node is labeled with the goal symbol of the grammar

Construct the root node of the parse tree


Repeat unBl lower fringe of the parse tree matches the input string
1.  At a node labeled with NT A, select a producBon with A on its LHS
and, for each symbol on its RHS, construct the appropriate child
2.  When a terminal symbol is added to the fringe and it doesn’t
match the fringe, backtrack
3.  Find the next node to be expanded (label ∈ NT)

The key is picking the right producMon in step 1


–  That choice should be guided by the input string

COMP 412, Fall 2017 15


The “Classic” Expression Grammar
Consider the Classic Expression Grammar

0 Goal ➝ Expr
1 Expr ➝ Expr + Term
2 | Expr - Term
3 | Term
4 Term ➝ Term * Factor And the input x – 2 * y
5 | Term / Factor
6 | Factor
7 Factor ➝ ( Expr )
8 | number
9 | id

COMP 412, Fall 2017 16


Example
Let’s try to derive x – 2 * y :
Goal

Rule Senten*al Form Input


— Goal ↑x - 2 * y

Lower fringe of the parMally completed parse tree

↑ is the posiBon in the input buffer


Build a le9most derivaBon, to work
with a le\-to-right scanner.
COMP 412, Fall 2017 17
Example
Let’s try to derive x – 2 * y :
Goal

Rule Senten*al Form Input


Expr
— Goal ↑x - 2 * y
0 Expr ↑x - 2 * y Expr + Term

1 Expr + Term ↑x - 2 * y
Term
3 Term + Term ↑x - 2 * y
6 Factor + Term ↑x - 2 * y Fact.
9 <id,x> + Term ↑x - 2 * y
<id,x>
→ <id,x> + Term x ↑- 2 * y

This worked well, except that “–” doesn’t match “+”


The parser must backtrack to here
↑ is the posiBon in the input buffer
COMP 412, Fall 2017 18
Example
Con9nuing with x – 2 * y :
Goal

Rule Senten*al Form Input


Expr
— Goal ↑x - 2 * y
0 Expr ↑x - 2 * y Expr – Term

2 Expr -Term ↑x - 2 * y
Term
3 Term -Term ↑x - 2 * y
6 Factor -Term ↑x - 2 * y Fact.
9 <id,x> - Term ↑x - 2 * y
<id,x>
→ <id,x> -Term x ↑- 2 * y
→ <id,x> -Term x - ↑2 * y
Now, “-” and “-” match Now we can expand Term to match “2”

⇒  Now, we need to expand Term - the last NT on the fringe


COMP 412, Fall 2017 19
Example
Trying to match the “2” in x – 2 * y :
Goal

Rule Senten*al Form Input


Expr
→ <id,x> - Term x - ↑2 * y
6 <id,x> - Factor x - ↑2 * y Expr – Term

8 <id,x> - <num,2> x - ↑2 * y
Term Fact.
→ <id,x> - <num,2> x - 2 ↑* y
Fact. <num,2>
Where are we?
•  “2” matches “2” <id,x>
•  We have more input, but no NTs leT to expand
•  The expansion terminated too soon
⇒  Need to backtrack

COMP 412, Fall 2017 20


Example
Trying again with “2” in x – 2 * y :
Goal
Rule Senten*al Form Input
→ <id,x> - Term x - ↑2 * y Expr

4 <id,x> - Term * Factor x - ↑2 * y Expr – Term


6 <id,x> - Factor * Factor x - ↑2 * y
8 <id,x> - <num,2> * Factor x - ↑2 * y Term Term * Fact.

→ <id,x> - <num,2> * Factor x - 2 ↑* y


Fact. Fact. <id,y>
→ <id,x> - <num,2> * Factor x - 2 * ↑y
9 <id,x> - <num,2> * <id,y> x - 2 * ↑y <id,x> <num,2>
→ <id,x> - <num,2> * <id,y> x - 2 * y↑

The Point:
This Mme, we matched & consumed all the input
⇒ For efficiency, the parser must make the correct choice when it expands a
Success! NT.
Wrong choices lead to wasted effort.

COMP 412, Fall 2017 21


Another possible parse
Other choices for expansion are possible
Rule Senten*al Form Input
— Goal ↑x - 2 * y
0 Expr ↑x - 2 * y Consumes no input!
1 Expr +Term ↑x - 2 * y
1 Expr + Term +Term ↑x - 2 * y

1 Expr + Term +Term + Term ↑x - 2 * y
1 … and so on …. ↑x - 2 * y

This expansion doesn’t terminate


•  Wrong choice of expansion leads to non-terminaMon
•  Non-termina*on is a bad property for a parser to have
•  Parser must make the right choice

COMP 412, Fall 2017 22


The Classic Expression Grammar

0 Goal ➝ Expr The possibility of an infinite sequence


1 Expr ➝ Expr + Term of expansions in a parser is bad. disastrous
2 | Expr - Term •  The problem arises from le\
3 | Term recursion in the grammar and a
4 Term ➝ Term * Factor le\most derivaMon1
5 | Term / Factor •  LHS symbol cannot appear at start
6 | Factor of the RHS
7 Factor ➝ ( Expr ) –  Cannot derive from it in mulBple
8 | number steps, either
9 | id •  Top down parsers build leTmost
derivaMons, so grammars with leT
Classic Expression Grammar recursion are not suitable for top-
down parsing
1 Similar problem arises with right recursion and a rightmost derivaMon.

COMP 412, Fall 2017 23


LeT Recursion

Top-down parsers cannot handle leH-recursive grammars

Formally,
A grammar is leH recursive if ∃ A ∈ NT such that a derivaMon
A ⇒+ Aα exists, for some string α ∈ (NT ∪ T ) +

Our classic expression grammar is leT recursive


•  This can lead to non-terminaMon in a top-down parser
•  In a top-down parser, any recursion must be right recursion
•  We would like to convert the leT recursion to right recursion
Non-termina*on is always a bad property in a compiler

Fortunately, we can eliminate leT recursion in an algorithmic way.

Right recursion is defined in a symmetric way.


COMP 412, Fall 2017 24
EliminaMng LeT Recursion
To remove leH recursion, we can transform the grammar
Consider a grammar fragment of the form
Fee → Fee α Language is β followed
| β by 0 or more α’s

where neither α nor β start with Fee

We can rewrite this fragment as


Fee → β Fie The new grammar defines the
same language as the old
Fie → α Fie
grammar, using only right
| ε recursion.
where Fie is a new non-terminal

Added a reference to
the empty string
New Idea: the ε producMon
COMP 412, Fall 2017 25
EliminaMng LeT Recursion
The expression grammar contains two cases of leH recursion
Expr → Expr + Term Term → Term * Factor
| Expr - Term | Term * Factor
| Term | Factor

Applying the transformaMon yields

Expr → Term Expr’ Term → Factor Term’


Expr’ + Term Expr’
→ Term’ → * Factor Term’
| - Term Expr’ | / Factor Term’
| ε | ε

These fragments use only right recursion

COMP 412, Fall 2017 26


EliminaMng LeT Recursion
Subs9tu9ng them back into the grammar yields

0 Goal → Expr
Right-recursive expression grammar
1 Expr → Term Expr’
•  This grammar is correct, if
2 Expr’ → + Term Expr’ somewhat counter-intuiMve.
3 | - Term Expr’ •  A top-down parser will terminate
4 | ε using it.
5 Term → Factor Term’ •  A top-down parser may need to
6 Term’ → * Factor Term’ backtrack with it.
7 | / Factor Term’ •  It is leT associaMve, as was the
8 | ε original
9 Factor → ( Expr ) ⇒  Why didn’t we just rewrite it so Expr
was at the right end of the RHS,
10 | number
rather than the leT end?
11 | id

COMP 412, Fall 2017 27


EliminaMng LeT Recursion
Subs9tu9ng them back into the grammar yields

0 Goal → Expr
Right-recursive expression grammar
1 Expr → Term Expr’
•  This grammar is correct, if
2 Expr’ → + Term Expr’ somewhat counter-intuiMve.
NOTE: This technique eliminates direct
3 | - Term Expr’ • leT recursion — when a producMon’s
A top-down parser will terminate
4 | ε using it.
RHS begins with its own LHS.
5 Term → Factor Term’ • It does not address indirect leT
A top-down parser may need to
6 Term’ → * Factor Term’ recursion. We will get there, in a
backtrack with it.
couple of slides …
7 | / Factor Term’ •  It is leT associaMve, as was the
8 | ε original
9 Factor → ( Expr ) ⇒  Why didn’t we just rewrite it so Expr
was at the right end of the RHS,
10 | number
rather than the leT end?
11 | id

COMP 412, Fall 2017 28


Associa9vity
EliminaMng LeT Recursion
No9ce that we do not use the naïve right-recursive form

Expr → Term Expr’ Expr → Term + Expr


Expr’ + Term Expr’
→ | Term - Expr
| - Term Expr’ | Term
| ε

Transformed grammar fragment Naïve right-recursive form

The naïve right-recursive form generates a different associaMvity (and


parse tree) than did the original grammar.
The transformed grammar fragment generates the same parse tree as
the original grammar did. (See § 3.5.3 in EaC2e.)

COMP 412, Fall 2017 29


Associa9vity
EliminaMng LeT Recursion
The naïve right-recursive form changes the associa9vity

Expr → Term Expr’ Expr → Term + Expr
Expr’ + Term Expr’
→ | Term - Expr
| - Term Expr’ | Term
| ε

Transformed grammar fragment Naïve right-recursive form

+ +

+ z w +

+ y x +

w x y z

ASTs for w + x + y + z
COMP 412, Fall 2017 30
Parsing with the RR CEG

0 Goal → Expr Rule Senten*al Form x – 2 * y, again


1 Expr → Term Expr’ — Goal
2 Expr’ → + Term Expr’ 0 Expr
3 | - Term Expr’ 1 Term Expr’

4 | ε 5 Factor Term’ Expr’

5 Term → Factor Term’ 11 <id,x> Term’ Expr’


8 <id,x> Expr’
6 Term’ → * Factor Term’
3 <id,x> - Term Expr’
7 | / Factor Term’
5 <id,x> - Factor Term’ Expr’
8 | ε
10 <id,x> - <num,2> Term’ Expr’
9 Factor → ( Expr )
6 <id,x> - <num,2> * Factor Term’ Expr’
10 | number
11 <id,x> - <num,2> * <id,y> Term’ Expr’
11 | id
8 <id,x> - <num,2> * <id,y> Expr’
Right Recursive CEG 4 <id,x> - <num,2> * <id,y>

COMP 412, Fall 2017 31


Parsing with RR CEG Goal x – 2 * y (again)

Text of slide
Rule Senten*al Form Expr
— Goal
Term Expr’
0 Expr
1 Term Expr’ minus Term Expr’
5 Factor Term’ Expr’
11 <id,x> Term’ Expr’
Factor Term’ ε
8 <id,x> Expr’
3 <id,x> - Term Expr’ <num,2>
5 <id,x> - Factor Term’ Expr’
10 <id,x> - <num,2> Term’ Expr’ Mmes Factor Term’
6 <id,x> - <num,2> * Factor Term’ Expr’
11 <id,x> - <num,2> * <id,y> Term’ Expr’ <id,y> ε
8 <id,x> - <num,2> * <id,y> Expr’
Factor Term’
4 <id,x> - <num,2> * <id,y>
<id,x> ε
COMP 412, Fall 2017 32

You might also like