You are on page 1of 25

Recursive Descent Parsing

!  Goal
•  Determine if we can produce the string to be parsed from the
grammar's start symbol
!  Approach
•  Construct a parse tree: starting with start symbol at root,
recursively replace nonterminal with RHS of production
!  Key question: which production to use?
•  There can be many productions for each nonterminal
•  We could try them all, backtracking if we are unsuccessful
•  But this is slow!
!  Answer: lookahead
•  Keep track of next token on the input string to be processed
•  Use this to guide selection of production

CMSC 330 - Spring 2011 1


Recursive Descent Example
E
E → id = n | { L }
{ L }
L→E;L|ε
E ; L
{x=3;{y=4;};}
x = 3 E ; L

lookahead { L } ε

E ; L

y = 4 ε

CMSC 330 - Spring 2011 2


Recursive Descent: Basic Strategy
!  Initially, “current node” is start node
!  When processing the current node, 4 possibilities
•  Node is the empty string
Ø  Move to next node in DFS order that has not yet been
processed
•  Node is a terminal that matches lookahead
Ø  Advance lookahead by one symbol and move to next node in
DFS order to be processed
•  Node is a terminal that does not match lookahead
Ø  Fail! String cannot be parsed
•  Node is a nonterminal
Ø  Pick a production based on lookahead, generate children, then
process children recursively
CMSC 330 - Spring 2011 3
Recursive Descent Parsing (cont.)
!  Key step
•  Choosing which production should be selected
!  Two approaches
•  Backtracking
Ø  Choose some production
Ø  If fails, try different production

Ø  Parse fails if all choices fail

•  Predictive parsing
Ø  Analyze grammar to find “First sets” for productions
Ø  Compare with lookahead to decide which production to select

Ø  Parse fails if lookahead does not match First

CMSC 330 - Spring 2011 4


First Sets
!  Example
•  Suppose the lookahead is x
•  For grammar S → xyz | abc
Ø  Select S → xyz since 1st terminal in RHS matches x
•  For grammar S → A | B A→x |y B→z
Ø  Select S → A, since A can derive string beginning with x
!  In general
•  We want to choose a production that can derive a
sentential form beginning with the lookahead
•  Need to know what terminals may be first in any
sentential form derived from a nonterminal / production
CMSC 330 - Spring 2011 5
First Sets
!  Definition
•  First(γ), for any sentential form γ, is the set of initial
terminals of all strings that γ may expand to
•  We’ll use this to decide what production to apply
!  Examples
•  Given grammar S → xyz | abc
Ø  First(xyz) = { x }, First(abc) = { a }
Ø  First(S) = First(xyz) U First(abc) = { x, a }

•  Given grammar S → A | B A→x |y B→z


Ø  First(x) = { x }, First(y) = { y }, First(A) = { x, y }
Ø  First(z) = { z }, First(B) = { z }
Ø  First(S) = { x, y, z }

CMSC 330 - Spring 2011 6


Calculating First(γ)
!  For a terminal a
•  First(a) = { a }
!  For a nonterminal N
•  If N → ε, then add ε to First(N)
•  If N → α1 α2 ... αn, then (note the αi are all the
symbols on the right side of one single production):
Ø  Add First(α1α2 ... αn) to First(N), where First(α1 α2 ... αn) is
defined as
•  First(α1) if ε ∉ First(α1)
•  Otherwise (First(α1) – ε) ∪ First(α2 ... αn)
Ø  If ε ∈ First(αi) for all i, 1 ≤ i ≤ k, then add ε to First(N)

CMSC 330 - Spring 2011 7


First( ) Examples
E → id = n | { L } E → id = n | { L } | ε
L→E;L|ε L→E;L

First(id) = { id } First(id) = { id }
First("=") = { "=" } First("=") = { "=" }
First(n) = { n } First(n) = { n }
First("{")= { "{" } First("{")= { "{" }
First("}")= { "}" } First("}")= { "}" }
First(";")= { ";" } First(";")= { ";" }
First(E) = { id, "{" } First(E) = { id, "{", ε }
First(L) = { id, "{", ε } First(L) = { id, "{", ";" }
CMSC 330 - Spring 2011 8
Recursive Descent Parser Implementation
!  For terminals, create function parse_a
•  If lookahead is a then parse_a consumes the lookahead
by advancing to the next token and then returns
•  Otherwise fails with a parse error if lookahead is not a
!  For each nonterminal N, create a function parse_N
•  Called when we’re trying to parse a part of the input
which corresponds to (or can be derived from) N
•  parse_S for the start symbol S begins the parse

CMSC 330 - Spring 2011 9


Parser Implementation (cont.)
!  The body of parse_N for a nonterminal N does
the following
•  Let N → β1 | ... | βk be the productions of N
Ø  Here βi is the entire right side of a production- a sequence of
terminals and nonterminals
•  Pick the production N → βi such that the lookahead is
in First(βi)
Ø  It must be that First(βi) ∩ First(βj) = ∅ for i ≠ j
Ø  If there is no such production, but N → ε then return
Ø  Otherwise fail with a parse error

•  Suppose βi = α1 α2 ... αn. Then call parse_α1(); ... ;


parse_αn() to match the expected right-hand side,
and return
CMSC 330 - Spring 2011 10
Recursive Descent Parser
!  Given grammar S → xyz | abc
•  First(xyz) = { x }, First(abc) = { a }
!  Parser
parse_S( ) {
if (lookahead == “x”) {
parse_x; parse_y; parse_z); // S → xyz
}
else if (lookahead == “a”) {
parse_a; parse_b; parse_c; // S → abc
}
else error( );
}
CMSC 330 - Spring 2011 11
Recursive Descent Parser
!  Given grammar S → A | B A→x |y B
→z
•  First(A) = { x, y }, First(B) = { z }
parse_A( ) {
!  Parser if (lookahead == “x”)
parse_S( ) { parse_x(); // A → x
if ((lookahead == “x”) || else if (lookahead == “y”)
(lookahead == “y”)) parse_y(); // A → y
parse_A( ); // S → A else error( );
}
else if (lookahead == “z”) parse_B( ) {
parse_B( ); // S → B if (lookahead == “z”)
else error( ); parse_z(); // B → z
} else error( );
}
CMSC 330 - Spring 2011 12
Example
E → id = n | { L } First(E) = { id, "{" }
L→E;L|ε
parse_E( ) { parse_L( ) {
if (lookahead == “id”) { if ((lookahead == “id”) ||
parse_id(); (lookahead == “{”)) {
parse_=(); // E → id = n parse_E( );
parse_n(); parse_; (); // L → E ; L
} parse_L( );
else if (lookahead == “{“) { }
parse_{ (); else ; // L → ε
parse_L( ); // E → { L } }
parse_} ();
}
else error( );
}
CMSC 330 - Spring 2011 13
Things to Notice
!  If you draw the execution trace of the parser
•  You get the parse tree
!  Examples
•  Grammar •  Grammar
S → xyz S→A|B
S → abc A→x |y
•  String “xyz” B→z
parse_S( ) •  String “x” S
S
parse_x() parse_S( ) |
/|\
parse_y()
x y z
parse_A( ) A
parse_z() parse_x |
x
CMSC 330 - Spring 2011 14
Things to Notice (cont.)
!  This is a predictive parser
•  Because the lookahead determines exactly which
production to use
!  This parsing strategy may fail on some grammars
•  Possible infinite recursion
•  Production First sets overlap
•  Production First sets contain ε
!  Does not mean grammar is not usable
•  Just means this parsing method not powerful enough
•  May be able to change grammar

CMSC 330 - Spring 2011 15


Left Factoring
!  Consider parsing the grammar E → ab | ac
•  First(ab) = a
•  First(ac) = a
•  Parser cannot choose between RHS based on
lookahead!
!  Parser fails whenever A → α1 | α2 and
•  First(α1) ∩ First(α2) != ε or ∅
!  Solution
•  Rewrite grammar using left factoring

CMSC 330 - Spring 2011 16


Left Factoring Algorithm
!  Given grammar
•  A → xα1 | xα2 | … | xαn | β
!  Rewrite grammar as
•  A → xL | β
•  L → α1 | α2 | … | αn
!  Repeat as necessary
!  Examples
•  S → ab | ac ⇨ S → aL L→b|c
•  S → abcA | abB | a ⇨ S → aL L → bcA | bB | ε
•  L → bcA | bB | ε ⇨ L → bL’ | ε L’ → cA |
B
CMSC 330 - Spring 2011 17
Left Recursion
!  Consider grammar S → Sa | ε
•  First(Sa) = a, so we’re ok as far as which production
•  Try writing parser parse_S( ) {
if (lookahead == “a”) {
parse_S( );
parse_a (); // S → Sa
}
else { }
}

•  Body of parse_S( ) has an infinite loop


Ø  if (lookahead = "a") then parse_S( )
•  Infinite loop occurs in grammar with left recursion
CMSC 330 - Spring 2011 18
Right Recursion
!  Consider grammar S → aS | ε
•  Again, First(aS) = a
•  Try writing parser parse_S( ) {
if (lookahead == “a”) {
parse_a();
parse_S( ); // S → aS
}
else { }
}

•  Will parse_S( ) infinite loop?


Ø  Invoking parse_tok( ) will advance lookahead, eventually stop
•  Top down parsers handles grammar w/ right recursion
CMSC 330 - Spring 2011 19
Expression Grammar for Top-Down Parsing
E → T E'
E' → ε | + E
T → P T'
T' → ε | * T
P→n | (E)

•  Notice we can always decide what production to


choose with only one symbol of lookahead

CMSC 330 - Spring 2011 20


Tradeoffs with Other Approaches
!  Recursive descent parsers are easy to write
•  The formal definition is a little clunky, but if you follow
the code then it’s almost what you might have done
if you weren't told about grammars formally
•  They're unable to handle certain kinds of grammars
!  Recursive descent is good for a simple parser
•  Though tools can be fast if you’re familiar with them
!  Can implement top-down predictive parsing as a
table-driven parser
•  By maintaining an explicit stack to track progress

CMSC 330 - Spring 2011 21


Tradeoffs with Other Approaches
!  More powerful techniques need tool support
•  Can take time to learn tools
!  Main alternative is bottom-up, shift-reduce parser
•  Replaces RHS of production with LHS (nonterminal)
•  Example grammar
Ø  S → aA, A → Bc, B → b
•  Example parse
Ø  abc ⇒ aBc ⇒ aA ⇒ S
Ø  Derivation happens in reverse

•  Something to look forward to in CMSC 430

CMSC 330 - Spring 2011 22


What’s Wrong With Parse Trees?
!  Parse trees contain too much information
•  Example
Ø  Parentheses
Ø  Extra nonterminals for precedence

•  This extra stuff is needed for parsing

!  But when we want to reason about languages


•  Extra information gets in the way (too much detail)

CMSC 330 - Spring 2011 23


Abstract Syntax Trees (ASTs)
!  An abstract syntax tree is a more compact,
abstract representation of a parse tree, with only
the essential parts

parse
AST
tree

CMSC 330 - Spring 2011 24


Abstract Syntax Trees (cont.)
!  Intuitively, ASTs correspond to the data structure
you’d use to represent strings in the language
•  Note that grammars describe trees
•  So do OCaml datatypes (which we’ll see later)
•  E → a | b | c | E+E | E-E | E*E | (E)

CMSC 330 - Spring 2011 25

You might also like