You are on page 1of 14

Example Expression Grammar

𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟
𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 − 𝑇𝑒𝑟𝑚 𝑇𝑒𝑟𝑚

priority
CS 335: Top-down Parsing 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚 ÷ 𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟
𝐹𝑎𝑐𝑡𝑜𝑟 → 𝐸𝑥𝑝𝑟 | num | name
Swarnendu Biswas

Semester 2022-2023-II
CSE, IIT Kanpur

Content influenced by many excellent references, see References slide for acknowledgements.

CS 335 Swarnendu Biswas

Derivation of name + name × name Derivation of name + name × name


Sentential Form Input Sentential Form Input
𝐸𝑥𝑝𝑟 ↑ name + name × name 𝐸𝑥𝑝𝑟 ↑ name + name × name
𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 input ↑ name + name × name 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 ↑ name + name × name
𝑇𝑒𝑟𝑚 + 𝑇𝑒𝑟𝑚 pointer ↑ name + name × name 𝑇𝑒𝑟𝑚 + 𝑇𝑒𝑟𝑚 ↑ name + name × name
𝐹𝑎𝑐𝑡𝑜𝑟 + 𝑇𝑒𝑟𝑚 ↑ name + name × name 𝐹𝑎𝑐𝑡𝑜𝑟 + 𝑇𝑒𝑟𝑚 ↑ name + name × name
name + 𝑇𝑒𝑟𝑚 ↑ name + name × name name + 𝑇𝑒𝑟𝑚 ↑ name + name × name
name + 𝑇𝑒𝑟𝑚 name ↑ +name × name name + 𝑇𝑒𝑟𝑚 name ↑ +name × name
name + 𝑇𝑒𝑟𝑚 name +↑ name × name name + 𝑇𝑒𝑟𝑚 name +↑ name × name
name + 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name name + 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name
The current input terminal being scanned is called the
name + 𝐹𝑎𝑐𝑡𝑜𝑟 × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name name + 𝐹𝑎𝑐𝑡𝑜𝑟 × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name
name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name
lookahead symbol
name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name
name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name + name ↑× name name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name + name ↑× name
name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name + name ×↑ name name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name + name ×↑ name
name + name × name name + name ×↑ name name + name × name name + name ×↑ name
name + name × name name + name × name ↑ name + name × name name + name × name ↑
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas
Derivation of name + name × name Derivation of name + name × name

𝑙𝑚 𝐸𝑥𝑝𝑟 𝑙𝑚 𝐸𝑥𝑝𝑟 𝑙𝑚 𝐸𝑥𝑝𝑟 𝑙𝑚 𝐸𝑥𝑝𝑟


𝑙𝑚 𝑙𝑚 𝐸𝑥𝑝𝑟 𝑙𝑚 𝐸𝑥𝑝𝑟 𝑙𝑚 𝐸𝑥𝑝𝑟 𝑙𝑚 𝐸𝑥𝑝𝑟
𝑆𝑡𝑎𝑟𝑡 𝐸𝑥𝑝𝑟
𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚
𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚

Term Term 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 Term 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 Term 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟
𝑇𝑒𝑟𝑚 Term Term
𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟
𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟
name name name name name
name
continued from
previous slide

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Derivation of name + name × name General Idea of Top-down Parsing


𝑙𝑚 𝑙𝑚
Start with the root (i.e., start symbol) of the parse tree
𝐸𝑥𝑝𝑟 𝐸𝑥𝑝𝑟

𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 Grow the tree downwards by expanding productions at the lower levels of the
tree
Term 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 Term 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟
• Select a nonterminal and extend it by adding children corresponding to the right side of some
production for the nonterminal
𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟 𝐹𝑎𝑐𝑡𝑜𝑟 name

name name name name


Repeat till

• Lower fringe consists only terminals and the input is consumed

Top-down parsing finds a leftmost derivation for an input string

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


General Idea of Top-down Parsing Top-down Parsing Algorithm
Start with the root of the parse tree root = node for the Start symbol if curr == word:
curr = root word = getNextWord()
Grow the tree by expanding productions at the lower levels of the tree push(null) // Stack curr = pop() // consumed
word = getNextWord() if word == eof and curr == null:
• Extend a nonterminal by adding children corresponding to the right side of some
production for the nonterminal accept input
while (true): else
Repeat till if curr ∈ Nonterminal: backtrack()
• Lower fringe consists only terminals and the input is consumed pick next rule 𝐴 ⟶ 𝛽1 𝛽2 … 𝛽𝑛 to
expand curr
• Mismatch in the lower fringe and the remaining input stream implies
create nodes for 𝛽1 , 𝛽2 , …, 𝛽𝑛
i. Wrong choice of productions while expanding nonterminals, selection of a as children of curr
production may involve trial-and-error
push(𝛽𝑛 ,𝛽𝑛−1 ,…, 𝛽1 )
ii. Input character stream is not part of the language
curr = 𝛽1

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Implementing Backtracking Cost of Backtracking


• A large subset of CFGs can be parsed without backtracking
• The grammar may require transformations Backtracking is expensive

• Steps in backtracking • Parser expands a nonterminal with the wrong rule


• Set curr to parent and delete the children • Mismatch between the lower fringe of the parse tree and the input is detected
• Expand the node curr with untried rules if any • Parser undoes the last few actions
• Create child nodes for each symbol in the right hand of the production • Parser tries other productions if any
• Push those symbols onto the stack in reverse order
• Set curr to the first child node
•Move curr up the tree if there are no untried rules
• Report a syntax error when there are no more moves

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Derivation of name + name × name Derivation of name + name × name
Rule # Production Rule # Sentential Form Input Rule # Production Rule # Sentential Form Input
0 𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟 𝐸𝑥𝑝𝑟 ↑ name + name × name 0 𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟 𝐸𝑥𝑝𝑟 ↑ name + name × name

1 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 1 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 ↑ name + name × name 1 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 1 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 ↑ name + name × name

2 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 − 𝑇𝑒𝑟𝑚 3 𝑇𝑒𝑟𝑚 + 𝑇𝑒𝑟𝑚 ↑ name + name × name 2 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 − 𝑇𝑒𝑟𝑚 3 𝑇𝑒𝑟𝑚 + 𝑇𝑒𝑟𝑚 ↑ name + name × name

3 𝐸𝑥𝑝𝑟 → 𝑇𝑒𝑟𝑚 6 𝐹𝑎𝑐𝑡𝑜𝑟 + 𝑇𝑒𝑟𝑚 ↑ name + name × name 3 𝐸𝑥𝑝𝑟 → 𝑇𝑒𝑟𝑚 6 𝐹𝑎𝑐𝑡𝑜𝑟 + 𝑇𝑒𝑟𝑚 ↑ name + name × name

4 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 9 name + 𝑇𝑒𝑟𝑚 ↑ name + name × name 4 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 9 name + 𝑇𝑒𝑟𝑚 ↑ name + name × name
name + 𝑇𝑒𝑟𝑚 name ↑ +name × name name + 𝑇𝑒𝑟𝑚 name ↑ +name × name
5 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 ÷ 𝐹𝑎𝑐𝑡𝑜𝑟 5 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 ÷ 𝐹𝑎𝑐𝑡𝑜𝑟
name + 𝑇𝑒𝑟𝑚 name +↑ name × name name + 𝑇𝑒𝑟𝑚 name +↑ name × name
6 𝑇𝑒𝑟𝑚 → 𝐹𝑎𝑐𝑡𝑜𝑟 6 𝑇𝑒𝑟𝑚 → 𝐹𝑎𝑐𝑡𝑜𝑟
4 name + 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name How does a top-down parser
4 choose
name + which rule toname
𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 apply?
+↑ name × name
7 𝐹𝑎𝑐𝑡𝑜𝑟 → (𝐸𝑥𝑝𝑟) 7 𝐹𝑎𝑐𝑡𝑜𝑟 → (𝐸𝑥𝑝𝑟)
6 name + 𝐹𝑎𝑐𝑡𝑜𝑟 × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name 6 name + 𝐹𝑎𝑐𝑡𝑜𝑟 × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name
8 𝐹𝑎𝑐𝑡𝑜𝑟 → num 8 𝐹𝑎𝑐𝑡𝑜𝑟 → num
9 name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name 9 name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name +↑ name × name
9 𝐹𝑎𝑐𝑡𝑜𝑟 → name 9 𝐹𝑎𝑐𝑡𝑜𝑟 → name
name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name + name ↑× name name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name + name ↑× name
name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name + name ×↑ name name + name × 𝐹𝑎𝑐𝑡𝑜𝑟 name + name ×↑ name
9 name + name × name name + name ×↑ name 9 name + name × name name + name ×↑ name
name + name × name name + name × name ↑ name + name × name name + name × name ↑
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Selecting a Production
Left Recursion
Rule # Production Rule # Sentential Form Input
0 𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟 𝐸𝑥𝑝𝑟 ↑ name + name × name
• A grammar is left-recursive if it has a nonterminal 𝐴 such that there is
1 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 1 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 ↑ name + name × name +
2 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 − 𝑇𝑒𝑟𝑚 1 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 + 𝑇𝑒𝑟𝑚 ↑ name + name × name a derivation 𝐴 ֜ 𝐴𝛼 for some string 𝛼
3 𝐸𝑥𝑝𝑟 → 𝑇𝑒𝑟𝑚 1 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 + 𝑇𝑒𝑟𝑚 + ⋯ ↑ name + name × name • Direct left recursion: There is a production of the form 𝐴 → 𝐴𝛼
4 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 1 … ↑ name + name × name • Indirect left recursion: First symbol on the right-hand side of a rule can derive
5 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 ÷ 𝐹𝑎𝑐𝑡𝑜𝑟 1 … ↑ name + name × name the symbol on the left
6 𝑇𝑒𝑟𝑚 → 𝐹𝑎𝑐𝑡𝑜𝑟
7 𝐹𝑎𝑐𝑡𝑜𝑟 → (𝐸𝑥𝑝𝑟) We can often reformulate a grammar to
A𝐹𝑎𝑐𝑡𝑜𝑟
top-down parser can loop indefinitely with left-recursive
8 → num avoid left recursion
9 grammar
𝐹𝑎𝑐𝑡𝑜𝑟 → name

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Remove Direct Left Recursion Remove Direct Left Recursion

𝐴 → 𝐴𝛼1 𝐴𝛼2 … |𝐴𝛼𝑚 𝛽1 … |𝛽𝑛 𝐸 → 𝑇𝐸 ′


𝐸 →𝐸+𝑇|𝑇 𝐸 ′ → +𝑇𝐸 ′
𝑇 →𝑇∗𝐹|𝐹 𝑇 → 𝐹𝑇 ′
𝐹 → 𝐸 | id 𝑇 ′ → ∗ 𝐹𝑇 ′
𝐹 → 𝐸 | id
𝐴 → 𝛽1 𝐴′ |𝛽2 𝐴′ |…| 𝛽𝑛 𝐴′
𝐴′ → 𝛼1 𝐴′ 𝛼2 𝐴′ … |𝛼𝑚 𝐴′ |𝜖

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Non-Left-Recursive Expression Grammar


Indirect Left Recursion
Rule # Production Rule # Production
0 𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟 0 𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟
1 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 + 𝑇𝑒𝑟𝑚 1 𝐸𝑥𝑝𝑟 → 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′
𝑆 → 𝐴𝑎 | 𝑏
2 𝐸𝑥𝑝𝑟 → 𝐸𝑥𝑝𝑟 − 𝑇𝑒𝑟𝑚 2 𝐸𝑥𝑝𝑟 ′ → + 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′
3 𝐸𝑥𝑝𝑟 → 𝑇𝑒𝑟𝑚 3 𝐸𝑥𝑝𝑟 ′ → − 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′
𝐴 → 𝐴𝑐 𝑆𝑑 𝜖
4 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 × 𝐹𝑎𝑐𝑡𝑜𝑟 4 𝐸𝑥𝑝𝑟 ′ → 𝜖
5 𝑇𝑒𝑟𝑚 → 𝑇𝑒𝑟𝑚 ÷ 𝐹𝑎𝑐𝑡𝑜𝑟 5 𝑇𝑒𝑟𝑚 → 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′
6 𝑇𝑒𝑟𝑚 → 𝐹𝑎𝑐𝑡𝑜𝑟 6 𝑇𝑒𝑟𝑚′ →× 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′ • There is a left recursion because 𝑆 → 𝐴𝑎 → 𝑆𝑑𝑎
7 𝐹𝑎𝑐𝑡𝑜𝑟 → (𝐸𝑥𝑝𝑟) 7 𝑇𝑒𝑟𝑚′ →÷ 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′
8 𝐹𝑎𝑐𝑡𝑜𝑟 → num 8 𝑇𝑒𝑟𝑚′ → 𝜖
9 𝐹𝑎𝑐𝑡𝑜𝑟 → name 9 𝐹𝑎𝑐𝑡𝑜𝑟 → (𝐸𝑥𝑝𝑟)
10 𝐹𝑎𝑐𝑡𝑜𝑟 → num
11 𝐹𝑎𝑐𝑡𝑜𝑟 → name

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Eliminating Left Recursion Eliminating Indirect Left Recursion
• Input: Grammar 𝐺 with no cycles or 𝜖-productions
• Algorithm 𝑆 → 𝐴𝑎 | 𝑏 𝑆 → 𝐴𝑎 | 𝑏
Arrange nonterminals in some order 𝐴1 , 𝐴2 , … , 𝐴𝑛 𝐴 → 𝐴𝑐 𝑆𝑑 𝜖 𝐴 → 𝑏𝑑𝐴′ | 𝐴′
for 𝑖 ← 1 … 𝑛 𝐴′ → 𝑐𝐴′ 𝑎𝑑𝐴′ 𝜖
for 𝑗 ← 1 to 𝑖 − 1
If ∃ a production 𝐴𝑖 → 𝐴𝑗 𝛾
Replace 𝐴𝑖 → 𝐴𝑗 𝛾 with one or more productions that expand 𝐴𝑗
Eliminate the immediate left recursion among the 𝐴𝑖 productions

Loop invariant at the start of outer iteration 𝑖


∀𝑘 < 𝑖, no production expanding 𝐴𝑘 has 𝐴𝑙 in its righthand side for all 𝑙 < 𝑘
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Avoid Backtracking FIRST Set


• Parser is to select the next rule • Intuition
• Compare the curr symbol and the next input symbol called the lookahead • Each alternative for the leftmost nonterminal leads to a distinct terminal
• Use the lookahead to disambiguate the possible production rules symbol
• Which rule to choose becomes obvious by comparing the next word in the
input stream
• Backtrack-free grammar is a CFG for which a leftmost, top-down
parser can always predict the correct rule with one word lookahead
• Also called a predictive grammar • Given a string 𝛾 of terminal and nonterminal symbols, FIRST(𝛾) is the
set of all terminal symbols that can begin any string derived from 𝛾
• We also need to keep track of which symbols can produce the empty string
• FIRST: (𝑁𝑇 ∪ 𝑇 ∪ 𝜖, EOF ) → (𝑇 ∪ 𝜖, EOF )

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Steps to Compute FIRST Set FIRST Set
1. If 𝑋 is a terminal, then FIRST 𝑋 = {𝑋} • Generalize FIRST relation to string of symbols
2. If 𝑋 → 𝜖 is a production, then 𝜖 ∈ FIRST(𝑋)
3. If 𝑋 is a nonterminal and 𝑋 → 𝑌1 𝑌2 … 𝑌𝑘 is a production FIRST 𝑋𝛾 → FIRST 𝑋 if 𝑋 ↛ 𝜖
I. Everything in FIRST(𝑌1 ) is in FIRST 𝑋 FIRST 𝑋𝛾 → FIRST 𝑋 ∪ FIRST 𝛾 if 𝑋 → 𝜖
II. If for some 𝑖, 𝑎 ∈ FIRST(𝑌𝑖 ) and ∀1 ≤ 𝑗 < 𝑖, 𝜖 ∈ FIRST(𝑌𝑗 ), then 𝑎 ∈
FIRST(𝑋)
III. If 𝜖 ∈ FIRST(𝑌1 … 𝑌𝑘 ), then 𝜖 ∈ FIRST(𝑋)

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Compute FIRST Set FOLLOW Set


𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟 FIRST 𝐸𝑥𝑝𝑟 = {(, name, num} • FOLLOW(𝑋) is the set of terminals that can immediately follow 𝑋
𝐸𝑥𝑝𝑟 → 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′ FIRST 𝐸𝑥𝑝𝑟 ′ = {+, −, 𝜖} • That is, 𝑡 ∈ FOLLOW(𝑋) if there is any derivation containing 𝑋𝑡
𝐸𝑥𝑝𝑟 ′ → +𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′ FIRST 𝑇𝑒𝑟𝑚 = {(, name, num}
−𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′ 𝜖 𝑆
FIRST 𝑇𝑒𝑟𝑚′ = {𝜖,×,÷}
𝑇𝑒𝑟𝑚 → 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′
FIRST 𝐹𝑎𝑐𝑡𝑜𝑟 = {(, name,num} Terminal 𝑐 is in FIRST(𝐴) and 𝑎
𝑇𝑒𝑟𝑚′ →× 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′ 𝛼 𝐴 𝑎 𝛽 is in FOLLOW(𝐴)
÷ 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′ 𝜖
𝑐 … 𝛾
𝐹𝑎𝑐𝑡𝑜𝑟 → (𝐸𝑥𝑝𝑟) | num | name

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Steps to Compute FOLLOW Set Compute FOLLOW Set
1. Place $ in FOLLOW(𝑆) where 𝑆 is the start symbol and $ is the end 𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟 FOLLOW 𝐸𝑥𝑝𝑟 = {$, )}
marker 𝐸𝑥𝑝𝑟 → 𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′ FOLLOW 𝐸𝑥𝑝𝑟 ′ = {$,)}
2. If there is a production 𝐴 → 𝛼𝐵𝛽, then everything in FIRST(𝛽) 𝐸𝑥𝑝𝑟 ′ → +𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′ FOLLOW 𝑇𝑒𝑟𝑚 = {$, +, −, )}
except 𝜖 is in FOLLOW(𝐵) −𝑇𝑒𝑟𝑚 𝐸𝑥𝑝𝑟 ′ 𝜖
FOLLOW 𝑇𝑒𝑟𝑚′ = {$,+, −, )}
3. If there is a production 𝐴 → 𝛼𝐵, or a production 𝐴 → 𝛼𝐵𝛽 where 𝑇𝑒𝑟𝑚 → 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′
FIRST(𝛽) contains 𝜖, then everything in FOLLOW(𝐴) is in FOLLOW 𝐹𝑎𝑐𝑡𝑜𝑟 = {$, +, −,×,÷, )}
𝑇𝑒𝑟𝑚′ →× 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′
FOLLOW(𝐵) ÷ 𝐹𝑎𝑐𝑡𝑜𝑟 𝑇𝑒𝑟𝑚′ 𝜖
𝐹𝑎𝑐𝑡𝑜𝑟 → (𝐸𝑥𝑝𝑟) | num | name

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Conditions for Backtrack-Free Grammar Backtracking


• Consider a production 𝐴 → 𝛽 𝑆𝑡𝑎𝑟𝑡 → 𝐸𝑥𝑝𝑟 𝐹𝑎𝑐𝑡𝑜𝑟 → name
𝐸𝑥𝑝𝑟 → 𝑇𝑒𝑟𝑚𝐸𝑥𝑝𝑟 ′ | name [ 𝐴𝑟𝑔𝑙𝑖𝑠𝑡 ]
FIRST 𝛽 if 𝜖 ∉ FIRST(𝛽)
FIRST + = ቊ 𝐸𝑥𝑝𝑟 ′ → +𝑇𝑒𝑟𝑚𝐸𝑥𝑝𝑟 ′ | name ( 𝐴𝑟𝑔𝑙𝑖𝑠𝑡 )
FIRST 𝛽 ∪ FOLLOW 𝐴 otherwise −𝑇𝑒𝑟𝑚𝐸𝑥𝑝𝑟 ′ 𝜖 𝐴𝑟𝑔𝑙𝑖𝑠𝑡 → 𝐸𝑥𝑝𝑟 𝑀𝑜𝑟𝑒𝐴𝑟𝑔𝑠
𝑇𝑒𝑟𝑚 → 𝐹𝑎𝑐𝑡𝑜𝑟𝑇𝑒𝑟𝑚′ 𝑀𝑜𝑟𝑒𝐴𝑟𝑔𝑠 → , 𝐸𝑥𝑝𝑟 𝑀𝑜𝑟𝑒𝐴𝑟𝑔𝑠
𝑇𝑒𝑟𝑚′ → × 𝐹𝑎𝑐𝑡𝑜𝑟𝑇𝑒𝑟𝑚′ |𝜖
• For any nonterminal 𝐴 where 𝐴 → 𝛽1 |𝛽2 |…| 𝛽𝑛 , a backtrack-free
÷ 𝐹𝑎𝑐𝑡𝑜𝑟𝑇𝑒𝑟𝑚′ 𝜖
grammar has the property
𝐹𝑎𝑐𝑡𝑜𝑟 → 𝐸𝑥𝑝𝑟 | num
FIRST + 𝐴 → 𝛽𝑖 ∩ FIRST + 𝐴 → 𝛽𝑗 = 𝜙, ∀1 ≤ 𝑖, 𝑗 ≤ 𝑛, 𝑖 ≠ 𝑗

Not all grammars are backtrack free

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Left Factoring Key Insight in Using Top-Down Parsing
• Left factoring is the process of extracting and isolating common • Efficiency depends on the accuracy of selecting the correct
prefixes in a set of productions production for expanding a nonterminal
𝐹𝑎𝑐𝑡𝑜𝑟 → 𝑛𝑎𝑚𝑒 𝐴𝑟𝑔𝑢𝑚𝑒𝑛𝑡𝑠
• Parser may not terminate in the worst case
𝐴𝑟𝑔𝑢𝑚𝑒𝑛𝑡𝑠 → 𝐴𝑟𝑔𝐿𝑖𝑠𝑡 𝐴𝑟𝑔𝐿𝑖𝑠𝑡 𝜖 • A large subset of the context-free grammars can be parsed without
backtracking
• Algorithm
𝐴 → 𝛼𝛽1 𝛼𝛽2 … 𝛼𝛽𝑛 𝛾1 𝛾2 … |𝛾𝑗

𝐴 → 𝛼𝐵|𝛾1 𝛾2 … |𝛾𝑗
𝐵 → 𝛽1 𝛽2 … |𝛽𝑛

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Recursive-Descent Parsing
• Recursive-descent parsing is a form of top-down parsing that may
require backtracking
• Consists of a set of procedures, one for each nonterminal
void A() {
Choose an A-production 𝐴 → 𝑋1 𝑋2 … 𝑋𝑘

Recursive-Descent Parsing for 𝑖 ← 1 … 𝑘


if 𝑋𝑖 is a nonterminal
call procedure 𝑋𝑖 ()
else if 𝑋𝑖 equals the current input symbol 𝑎
advance the input to the next symbol
else
// error
}
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas
Limitations with Recursive-Descent Parsing Recursive-Descent Parsing with Backtracking
• Consider a grammar with two productions 𝑋 → 𝛾1 and 𝑋 → 𝛾2 • To support backtracking
• Suppose FIRST(𝛾1 ) ∩ FIRST(𝛾2 ) ≠ 𝜙 • All productions should be tried in some order
• Say 𝑎 is the common terminal symbol • Failure for some production implies we need to try remaining productions
• Report an error only when there are no other rules
• Function corresponding to 𝑋 will not know which production to use
on input token 𝑎

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Predictive Parsing Pseudocode for a Predictive Parser


void stmt() {
switch(lookahead) {
• Special case of recursive-descent parsing that does not require case expr:
backtracking match(expr); match(‘;’); break;
• Lookahead symbol unambiguously determines which production rule to use case if:
match(if); match(‘(‘); match(expr); match(‘)’); stmt(); break;
• Advantage is that the algorithm is simple and the parser can be constructed case for:
by hand match(for); match(‘(‘); optexpr(); match(‘;’); optexpr();
match(‘;’); optexpr(); match(‘)’); stmt(); break;
𝑠𝑡𝑚𝑡 → expr ; case other:
| if 𝑒𝑥𝑝𝑟 𝑠𝑡𝑚𝑡 match(other); break;
| for 𝑜𝑝𝑡𝑒𝑥𝑝𝑟 ; 𝑜𝑝𝑡𝑒𝑥𝑝𝑟 ; 𝑜𝑝𝑡𝑒𝑥𝑝𝑟 𝑠𝑡𝑚𝑡
default:
| other
report(“syntax error”);
𝑜𝑝𝑡𝑒𝑥𝑝𝑟 → 𝜖 | expr
}
}

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


LL(1) Grammars Nonrecursive Table-Driven Predictive Parser
• Class of grammars for which no backtracking is required Input a + b $

• First L stands for left-to-right scan, second L stands for leftmost derivation
• There is one lookahead token
• In LL(k), k stands for k lookahead tokens Predictive
Stack X Output
• Predictive parsers accept LL(k) grammars Parsing Program
• Every LL(1) grammar is a LL(2) grammar Y

$ Parsing Table 𝑀

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Predictive Parsing Algorithm Construction of a Predictive Parsing Table


• Input: String 𝑤 and parsing table 𝑀 for grammar 𝐺
• Algorithm: • Input: Grammar 𝐺
Let 𝑎 be the first symbol in 𝑤
Let 𝑋 be the symbol at the top of the stack
while 𝑋 ≠ $: • Algorithm:
if 𝑋 == 𝑎: • For each production 𝐴 → 𝛼 in 𝐺,
pop the stack and advance the input
• For each terminal 𝑎 in FIRST 𝛼 , add 𝐴 → 𝛼 to 𝑀[𝐴, 𝑎]
else if 𝑋 is a terminal or 𝑀[𝑋, 𝑎] is an error entry:
error • If 𝜖 is in FIRST 𝛼 , then for each terminal 𝑏 in FOLLOW(𝐴), add 𝐴 → 𝛼 to 𝑀 𝐴, 𝑏
else if 𝑀 𝑋, 𝑎 == 𝑋 → 𝑌1 𝑌2 … 𝑌𝑘 : • If 𝜖 is in FIRST 𝛼 and $ is in FOLLOW(𝐴), add 𝐴 → 𝛼 to 𝑀[𝐴, $]
output the production • No production in 𝑀[𝐴, 𝑎] indicates error
pop the stack
push 𝑌𝑘 𝑌𝑘−1 … 𝑌1 onto the stack
𝑋 ← top stack symbol

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Predictive Parsing Table 𝐸 → 𝑇𝐸 ′ Working of Predictive Parser
𝐸 ′ → +𝑇𝐸 ′ | 𝜖
FIRST 𝐸 = FIRST 𝑇 = FIRST 𝐹 = {(, id} FOLLOW 𝐸 = FOLLOW 𝐸 ′ = {$, )} 𝑇 → 𝐹𝑇 ′ Matched Stack Input Action
FIRST 𝐸 ′
= {+, 𝜖} FOLLOW 𝑇 = FOLLOW 𝑇 ′ = {$,+, )} 𝑇 ′ →∗ 𝐹𝑇 ′ | 𝜖 𝐸$ id + id ∗ id$
FIRST 𝑇 ′ = {𝜖, ×} FOLLOW 𝐹 = {$, +,×, )} 𝐹 → 𝐸 | id
𝑇𝐸 ′ $ id + id ∗ id$ Output 𝐸 → 𝑇𝐸 ′
𝐹𝑇 ′ 𝐸 ′ $ id + id ∗ id$ Output 𝑇 → 𝐹𝑇 ′
′ ′
Nonterminal id + * ( ) $ id𝑇 𝐸 $ id + id ∗ id$ Output 𝐹 → id
′ ′
id 𝑇𝐸$ +id ∗ id$ Match id
𝐸 𝐸 → 𝑇𝐸 ′ 𝐸 → 𝑇𝐸 ′
id 𝐸′$ +id ∗ id$ Output 𝑇 ′ → 𝜖
𝐸′ 𝐸 ′ → +𝑇𝐸 ′ 𝐸′ → 𝜖 𝐸′ → 𝜖
id +𝑇𝐸 ′ $ +id ∗ id$ Output 𝐸 ′ → +𝑇𝐸 ′
′ ′
𝑇 𝑇 → 𝐹𝑇 𝑇 → 𝐹𝑇 id + 𝑇𝐸 ′ $ id ∗ id$ Match +
′ ′
𝑇′ 𝑇′ → 𝜖 𝑇 ′ →∗ 𝐹𝑇 ′ 𝑇′ → 𝜖 𝑇′ → 𝜖 id + 𝐹𝑇 𝐸 $ id ∗ id$ Output 𝑇 → 𝐹𝑇 ′
id + id𝐓 ′ 𝐸 ′ $ id ∗ id$ Output 𝐹 → id
𝐹 𝐹 → id 𝐹 → (𝐸)

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Working of Predictive Parser


Matched Stack Input Action
Predictive Parsing

id + id𝑇 ′ 𝐸 ′ $ id ∗ id$ Output 𝐹 → id • Grammars whose predictive parsing tables contain no duplicate
id + id 𝑇𝐸$′ ′
∗ id$ Match id entries are called LL(1)
id + id ′ ′
∗ 𝐹𝑇 𝐸 $ ∗ id$ Output 𝑇 ′ →∗ 𝐹𝑇 ′
• No left-recursive or ambiguous grammar can be LL(1)
id + id∗ 𝐹𝑇 ′ 𝐸 ′ $ id$ Match ∗ • If grammar 𝐺 is left-recursive or is ambiguous, then parsing table 𝑀
id + id∗ id𝑇 ′ 𝐸 ′ $ id$ Output 𝐹 → id
will have at least one multiply-defined cell
id + id∗id 𝑇 ′𝐸′$ $ Match id • Some grammars cannot be transformed into LL(1)
id + id∗id 𝐸$′
$ Output 𝑇 ′ → 𝜖 • The adjacent grammar is ambiguous
𝑆 → 𝑖𝐸𝑡𝑆𝑆 ′ | 𝑎

id + id∗id $ $ Output 𝐸 → 𝜖 𝑆 ′ → 𝑒𝑆 | 𝜖
𝐸→𝑏

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Predictive Parsing Table 𝑆 → 𝑖𝐸𝑡𝑆𝑆 ′ | 𝑎
𝑆 ′ → 𝑒𝑆 | 𝜖
Error Recovery in Predictive Parsing
𝐸→𝑏
• Error conditions
• Terminal on top of the stack does not match the next input symbol
• Nonterminal 𝐴 is on top of the stack, 𝑎 is the next input symbol, and 𝑀[𝐴, 𝑎]
Nonterminal a b e i t $ is error
𝑆 𝑆→𝑎 𝑆 → 𝑖𝐸𝑡𝑆𝑆 ′
• Choices
𝑆′ 𝑆′ → 𝜖 𝑆′ → 𝜖 i. Raise an error and quit parsing
𝑆 ′ → 𝑒𝑆
ii. Print an error message, try to recover from the error, and continue with
𝐸 𝐸→𝑏 𝑇 → 𝐹𝑇 ′ compilation

What do we do when
we see an else?

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Predictive Parsing Table with Synchronizing


Error Recovery in Predictive Parsing Tokens FOLLOW 𝐸 = FOLLOW 𝐸 ′ = {$, )}
𝐸 → 𝑇𝐸 ′
FOLLOW 𝑇 = FOLLOW 𝑇 ′ = {$,+, )}
𝐸 → +𝑇𝐸 | 𝜖 ′ ′

FOLLOW 𝐹 = {$, +,×, )} 𝑇 → 𝐹𝑇 ′


• Panic mode – skip over symbols until a token in a set of synchronizing 𝑇 ′ →∗ 𝐹𝑇 ′ | 𝜖
(synch) tokens appears 𝐹 → 𝐸 | id
• Add all tokens in FOLLOW(𝐴) to the synch set for 𝐴, parsing can continue if
the parser sees an input symbol in FOLLOW(𝐴) Nonterminal id + * ( ) $
• Add symbols in FIRST(𝐴) to the synch set for 𝐴, parsing can continue with 𝐸 𝐸 → 𝑇𝐸 ′
𝐸 → 𝑇𝐸 ′
synch synch
the nonterminal 𝐴 that is at the top of the stack
• Add keywords that can begin constructs 𝐸′ 𝐸 ′ → +𝑇𝐸 ′ 𝐸′ → 𝜖 𝐸′ → 𝜖
• …
𝑇 𝑇 → 𝐹𝑇 ′ synch 𝑇 → 𝐹𝑇 ′ synch synch
• Other error handling policies
• Skip input if the table does not have an entry 𝑇′ 𝑇′ → 𝜖 𝑇 ′ →∗ 𝐹𝑇 ′ 𝑇′ → 𝜖 𝑇′ → 𝜖
• Pop nonterminal if the table entry is synch
𝐹 𝐹 → id synch synch 𝐹 → (𝐸) synch synch

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas


Error Recover Moves by Predictive Parser Error Recover Moves by Predictive Parser
Stack Input Remark Stack Input Remark

𝐸$ +id ∗ +id$ Entry is blank, Error, skip + 𝐸$ +id$ …continuation
𝐸$ id ∗ +id$ +𝑇𝐸 ′ $ +id$
𝑇𝐸 ′ $ id ∗ +id$ 𝑇𝐸 ′ $ id$
′ ′ ′
𝐹𝑇𝐸 $ id ∗ +id$ 𝐹𝑇 𝐸 $ id$
id𝑇𝐸 ′ $ id ∗ +id$ id𝑇 ′ 𝐸 ′ $ id$
𝑇 ′𝐸′$ ∗ +id$ 𝑇 ′𝐸′$ $
′ ′ ′
∗ 𝐹𝑇 𝐸 $ ∗ +id$ 𝐸$ $
𝐹𝑇 ′ 𝐸 ′ $ +id$ Error, 𝑀 𝐹, + = synch, pop 𝐹 $ $
𝑇 ′𝐸′$ +id$
𝐸′$ +id$

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

References
• A. Aho et al. Compilers: Principles, Techniques, and Tools, 2nd edition, Chapter 4.4.
• K. Cooper and L. Torczon. Engineering a Compiler, 2nd edition, Chapter 3.3.

CS 335 Swarnendu Biswas

You might also like