5 Compiler - Slide

Rightmost Derivation of 𝑎𝑏𝑏𝑐𝑑𝑒
𝑆 → 𝑎𝐴𝐵𝑒 Input string: 𝑎𝑏𝑏𝑐𝑑𝑒

𝐴 → 𝐴𝑏𝑐 | 𝑏 𝑆 → 𝑎𝐴𝐵𝑒
𝐵→𝑑
→ 𝑎𝐴𝑑𝑒
CS 335: Bottom-up Parsing → 𝑎𝐴𝑏𝑐𝑑𝑒

→ 𝑎𝑏𝑏𝑐𝑑𝑒
Swarnendu Biswas
𝑟𝑚 𝑟𝑚 𝑟𝑚 𝑟𝑚
𝑆 𝑆 𝑆 𝑆 𝑆
Semester 2022-2023-II
𝑎 𝐴 𝐵 𝑒 𝑎 𝐴 𝐵 𝑒 𝑎 𝐴 𝐵 𝑒 𝑎 𝐴 𝐵 𝑒
CSE, IIT Kanpur
𝑑 𝐴 𝑏 𝑐 𝑑 𝐴 𝑏 𝑐 𝑑
Content influenced by many excellent references, see References slide for acknowledgements.
𝑏
CS 335 Swarnendu Biswas
Bottom-up Parsing
Bottom-up Parsing 𝑆 → 𝑎𝐴𝐵𝑒 Input string: 𝑎𝑏𝑏𝑐𝑑𝑒
𝐴 → 𝐴𝑏𝑐 | 𝑏 𝑎𝑏𝑏𝑐𝑑𝑒
Constructs the parse tree starting from the leaves and working up 𝐵→𝑑 → 𝑎𝐴𝑏𝑐𝑑𝑒
toward the root → 𝑎𝐴𝑑𝑒
→ 𝑎𝐴𝐵𝑒
𝑆 → 𝑎𝐴𝐵𝑒 Input string: 𝑎𝑏𝑏𝑐𝑑𝑒 →𝑆
𝐴 → 𝐴𝑏𝑐 | 𝑏 𝑆 → 𝑎𝐴𝐵𝑒 𝑎𝑏𝑏𝑐𝑑𝑒 𝑎𝑏𝑏𝑐𝑑𝑒 ⇒ 𝑎 𝐴 𝑏 𝑐 𝑑 𝑒 ⇒ 𝑎 𝐴 𝑑 𝑒 ⇒ 𝑎 𝐴 𝐵 𝑒 ⇒ 𝑆
𝐵→𝑑
→ 𝑎𝐴𝑑𝑒 → 𝑎𝐴𝑏𝑐𝑑𝑒
𝑏 𝐴 𝑏 𝑐 𝐴 𝑏 𝑐 𝑑 𝑎 𝐴 𝐵 𝑒
→ 𝑎𝐴𝑏𝑐𝑑𝑒 → 𝑎𝐴𝑑𝑒
→ 𝑎𝑏𝑏𝑐𝑑𝑒 → 𝑎𝐴𝐵𝑒 𝑏 𝑏 𝐴 𝑏 𝑐 𝑑
→𝑆 reverse of
rightmost 𝑏
derivation CS 335 Swarnendu Biswas
Reduction Handle
• Handle is a substring that matches the body of a production
• Bottom-up parsing reduces a string 𝑤 to the start symbol 𝑆 • Reducing the handle is one step in the reverse of the rightmost derivation
• At each reduction step, a chosen substring that is the RHS (or body) of a
production is replaced by the LHS (or head) nonterminal Right Sentential Form Handle Reducing Production
𝐸 →𝐸+𝑇|𝑇
id1 ∗ id2 id1 𝐹 → id
Rightmost derivation
𝑇 →𝑇∗𝐹|𝐹
𝐹 ∗ id2 𝐹 𝑇→𝐹
𝐹 → 𝐸 | id
𝑇 ∗ id2 id2 𝐹 → id
𝑆 𝛾0 𝛾1 𝛾2 … 𝛾𝑛−1 𝛾𝑛 = 𝑤 𝑇∗𝐹 𝑇∗𝐹 𝑇 →𝑇∗𝐹
𝑟𝑚 𝑟𝑚 𝑟𝑚 𝑟𝑚 𝑟𝑚 𝑟𝑚
𝑇 𝑇 𝐸→𝑇
Bottom-up Parser
Although 𝑇 is the body of the production 𝐸 → 𝑇, 𝑇 is not a handle in the sentential form 𝑇 ∗ id2 .
The leftmost substring that matches the body of some production need not be a handle.
CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas
Handle Handle
• If 𝑆 ∗ 𝛼𝐴𝑤 𝛼𝛽𝑤, then 𝐴 →
𝑟𝑚 𝑟𝑚 𝑆
𝛽 is a handle of 𝛼𝛽𝑤 If grammar 𝐺 is unambiguous, then every right sentential form has
only one handle
• String 𝑤 right of a handle must 𝐴
contain only terminals 𝛼 𝛽 𝑤
If 𝐺 is ambiguous, then there can be more than one rightmost
A handle 𝐴 → 𝛽 in the parse tree for derivation of 𝛼𝛽𝑤
𝛼𝛽𝑤

Shift-Reduce Parsing
• The input string (i.e., being parsed) consists of two parts
• Left part is a string of terminals and nonterminals, and is stored in stack
• Right part is a string of terminals read from an input buffer
• Bottom of the stack and end of input are represented by $
• Type of bottom-up parsing with two primary actions, shift and reduce
Shift-Reduce Parsing • Other obvious actions are accept and error
• Shift-Reduce actions
• Shift: shift the next input symbol from the right string onto the top of the
stack
• Reduce: identify a string on top of the stack that is the body of a production,
and replace the body with the head
Shift-Reduce Parsing Shift-Reduce Parsing

𝑇 →𝑇∗𝐹|𝐹
𝐹 → 𝐸 | id
• Initial Stack Input
$ 𝑤$ Stack Input Action
$ 𝐢𝐝1 ∗ 𝐢𝐝2 $ Shift
$𝐢𝐝1 ∗ 𝐢𝐝2 $ Reduce by 𝐹 → id
⇒
Shift Reduce
∗
$𝐹 ∗ 𝐢𝐝2 $ Reduce by 𝑇 → 𝐹
$𝑇 ∗ 𝐢𝐝2 $ Shift
$𝑇 ∗ 𝐢𝐝2 $ Shift
• Final goal Stack Input
$𝑇 ∗ 𝐢𝐝2 $ Reduce by 𝐹 → id
$𝑆 $
$𝑇 ∗ 𝐹 $ Reduce by 𝑇 → 𝑇 ∗ 𝐹
$𝑇 $ Reduce by 𝐸 → 𝑇
$𝐸 $ Accept Or report an error in
case of a syntax error
Handle on Top of the Stack Possible Choices in Rightmost Derivation
• Is the following scenario possible? 1. 𝑆
𝑟𝑚
𝛼𝐴𝑧
𝑟𝑚
𝛼𝛽𝐵𝑦𝑧
𝑟𝑚
𝛼𝛽𝛾𝑦𝑧 2. 𝑆
𝑟𝑚
𝛼𝐵𝑥𝐴𝑧
𝑟𝑚
𝛼𝐵𝑥𝑦𝑧
𝑟𝑚
𝛼𝛾𝑥𝑦𝑧
Stack Input Action 𝑆 𝑆

…
$ 𝛼𝛽𝛾 𝑤$ Reduce by 𝐴 → 𝛾
$ 𝛼𝛽𝐴 𝑤$ Reduce by 𝐵 → 𝛽 𝐴
$𝛼𝐵𝐴 𝑤$
… 𝐵 𝐴
𝐵
𝛼 𝛾 𝑥 𝑦 𝑧
𝛼 𝛽 𝛾 𝑦 𝑧
Handle on Top of the Stack Shift-Reduce Actions

• Is the following scenario possible? • Shift: shift the next input symbol from the right string onto the top of
the stack
Stack Input Action • Reduce: identify a string on top of the stack that is the body of a
… production, and replace the body with the head
Handle
$ 𝛼𝛽𝛾 always eventually appears on top of the
𝑤$ Reduce by 𝐴stack,
→𝛾 never
$ 𝛼𝛽𝐴 𝑤$ Reduce by 𝐵 → 𝛽
inside
$𝛼𝐵𝐴 𝑤$ How do you decide when to shift and when to reduce?
…

Steps in Shift-Reduce Parsers Challenges in Bottom-up Parsing
General shift-reduce technique Which action do you

If there is no handle on the stack, then shift pick when there is a
• Both shift and reduce are valid,
If there is a handle on the stack, then reduce choice? implies a shift-reduce conflict
• Bottom up parsing is essentially the process of detecting handles and

reducing them Which rule to use if
• Different bottom-up parsers differ in the way they detect handles reduction is possible
by more than one • Reduce-reduce conflict
rule?
𝐸 → 𝐸 + 𝐸 𝐸 ∗ 𝐸 id
Shift-Reduce Conflict Shift-Reduce Conflict
id + id ∗ id id + id ∗ id S𝑡𝑚𝑡 → if 𝐸𝑥𝑝𝑟 then 𝑆𝑡𝑚𝑡
| if 𝐸𝑥𝑝𝑟 then 𝑆𝑡𝑚𝑡 else 𝑆𝑡𝑚𝑡
Stack Input Action Stack Input Action
$ id + id ∗ id$ Shift $ id + id ∗ id$ Shift | 𝑜𝑡ℎ𝑒𝑟
… …
$𝐸 + 𝐸 ∗ id$ Reduce by 𝐸 → 𝐸 + 𝐸 $𝐸 + 𝐸 ∗ id$ Shift Stack Input Action
$𝐸 ∗ id$ Shift $𝐸 + 𝐸 ∗ id$ Shift
$𝐸 ∗ id$ Shift $𝐸 + 𝐸 ∗ id $ Reduce by 𝐸 → id … if 𝐸𝑥𝑝𝑟 then 𝑆𝑡𝑚𝑡 else … $
$𝐸 ∗ id $ Reduce by 𝐸 → id $𝐸 + 𝐸 ∗ 𝐸 $ Reduce by 𝐸 → 𝐸 ∗ 𝐸
$𝐸 ∗ 𝐸 $ Reduce by 𝐸 → 𝐸 ∗ 𝐸 $𝐸 + 𝐸 $ Reduce by 𝐸 → 𝐸 + 𝐸
What is a correct thing to do for
$𝐸 $ $𝐸 $
this grammar – shift or reduce?
E.g., we can prioritize shifts.
𝑀 →𝑅+𝑅 𝑅+𝑐 𝑅
Reduce-Reduce Conflict 𝑅→𝑐
𝑐+𝑐 𝑐+𝑐
Stack Input Action Stack Input Action

$ 𝑐 + 𝑐$ Shift $ 𝑐 + 𝑐$ Shift
$𝑐 +𝑐$ Reduce by 𝑅 → 𝑐 $𝑐 +𝑐$ Reduce by 𝑅 → 𝑐
$𝑅
$𝑅 +
+𝑐$ Shift
𝑐$ Shift
$𝑅
$𝑅 +
+𝑐$ Shift
𝑐$ Shift LR Parsing
$𝑅 + 𝑐 $ Reduce by 𝑅 → 𝑐 $𝑅 + 𝑐 $ Reduce by 𝑀 → 𝑅 + 𝑐
$𝑅 + 𝑅 $ Reduce by 𝑅 → 𝑅 + 𝑅 $𝑀 $
$𝑀 $
LR(k) Parsing Popularity of LR Parsing

• Popular bottom-up parsing scheme Can recognize almost all language constructs with CFGs
• L is for left-to-right scan of input, R is for reverse of rightmost derivation, k is
the number of lookahead symbols
• LR parsers are table-driven, like the non-recursive LL parser Most general nonbacktracking shift-reduce parsing method
• LR grammar is one for which we can construct an LR parsing table
Works for a superset of grammars parsed with predictive or LL parsers

• LL(k) parsing predicts which production to use having seen only the first k tokens of the right-
hand side
• LR(k) parsing can decide after it has seen input tokens corresponding to the entire right-hand
side of the production
Block Diagram of LR Parser LR Parsing
Input 𝑎1 … … 𝑎𝑖 … … 𝑎𝑛 $
• Remember the basic questions: when to shift and when to reduce!

LR Parsing • Information is encoded in a DFA constructed using canonical LR(0)
Stack 𝑠𝑚 Output
Program collection
𝑠𝑚−1 I. Augmented grammar 𝐺 ′ with new start symbol 𝑆 ′ and rule 𝑆 ′ → 𝑆
II. Define helper functions Closure() and Goto()
…
ACTION GOTO
$
Parse Table
The LR parsing driver is the same for all LR parsers, only the parsing
table (including ACTION and GOTO) changes across parser types
𝐸′ → 𝐸
LR(0) Item Production Items Closure Operation 𝐸 →𝐸+𝑇|𝑇
𝐴 → •𝑋𝑌𝑍 𝑇 →𝑇∗𝐹|𝐹
• An LR(0) item (also called 𝐴 → 𝑋•𝑌𝑍 • Let 𝐼 be a set of items for a 𝐹 → 𝐸 | id
item) of a grammar 𝐺 is a 𝐴 → 𝑋𝑌𝑍 grammar 𝐺
𝐴 → 𝑋𝑌•𝑍 Suppose 𝐼 = {𝐸 ′ → •𝐸 }
production of 𝐺 with a dot at • Closure(𝐼) is constructed as
𝐴 → 𝑋𝑌𝑍• Closure(𝐼) = {
some position in the body
follows: 𝐸 ′ → •𝐸,
• 𝐴 → •𝑋𝑌𝑍 indicates that we expect a 𝐸 → •𝐸 + 𝑇,
string derivable from 𝑋𝑌𝑍 next in the 1. Add every item in 𝐼 to Closure(𝐼)
• An item indicates how much input 2. If 𝐴 → 𝛼•𝐵𝛽 is in Closure(𝐼) and 𝐸 → •𝑇,
of a production we have seen • 𝐴 → 𝑋•𝑌𝑍 indicates that we saw a 𝐵 → 𝛾 is a rule, then add 𝐵 → •𝛾 𝑇 → •𝑇 ∗ 𝐹,
• Symbols on the left of “•” are string derivable from 𝑋 in the input, to Closure(𝐼) if not already 𝑇 → •𝐹,
already on the stack and we expect a string derivable from added 𝐹→• 𝐸 ,
• Symbols on the right of “•” are 𝑌𝑍 next in the input 3. Repeat until no more new items 𝐹 → •id
expected in the input • 𝐴 → ϵ generates only one item 𝐴 → • can be added to Closure(𝐼) }

Kernel and Nonkernel Items Goto Operation
• If one 𝐵-production is added to Closure(𝐼) with the dot at the left • Suppose 𝐼 is a set of items and 𝑋 is a grammar symbol
end, then all 𝐵-productions will be added to the closure • Goto(𝐼, 𝑋) is the closure of set all items [𝐴 → 𝛼𝑋•𝛽] such that [𝐴 →
• Kernel items 𝛼•𝑋𝛽] is in 𝐼
• Initial item 𝑆′ → •𝑆, and all items whose dots are not at the left end • If 𝐼 is a set of items for some valid prefix 𝛼, then Goto(𝐼,𝑋) is set of valid items
for prefix 𝛼𝑋
• Nonkernel items
• All items with their dots at the left end, except for 𝑆′ → •𝑆
• Intuitively, Goto(𝐼, 𝑋) defines the transitions in the LR(0) automaton
• Goto(𝐼, 𝑋) gives the transition from state 𝐼 under input 𝑋
Example of Goto Canonical Collection of Sets of LR(0) Items
𝐸′ → 𝐸
Goto(𝐼, +) = { 𝐶 = Closure({𝑆 ′ → •𝑆})
𝐸 →𝐸+𝑇|𝑇 𝐸 → 𝐸 + •𝑇, repeat
𝑇 →𝑇∗𝐹|𝐹 𝑇 → •𝑇 ∗ 𝐹, for each set of items 𝐼 in 𝐶
𝐹 → 𝐸 | id 𝑇 → •𝐹, for each grammar symbol 𝑋
𝐹→• 𝐸 ,
if Goto(𝐼, 𝑋) is not empty and not in 𝐶
Suppose 𝐼 = { 𝐹 → •id
add Goto(𝐼, 𝑋) to 𝐶
𝐸 ′ → 𝐸•, }
𝐸 → 𝐸• + 𝑇 until no new sets of items are added to 𝐶
}

Canonical Collection of Sets of LR(0) Items Canonical Collection of Sets of LR(0) Items
𝐸′ → 𝐸
• Compute the canonical 𝐼0 = Closure({𝐸 ′ → •𝐸})= { 𝐼2 = Goto(𝐼0 , 𝑇)= { 𝐼4 = Goto(𝐼0 , "(")= {
𝐸 ′ → •𝐸, 𝐸 → 𝑇•, 𝐹 → (•𝐸),
collection for the expression 𝐸 → •𝐸 + 𝑇, 𝑇 → 𝑇• ∗ 𝐹 𝐸 → •𝐸 + 𝑇,
𝑇 →𝑇∗𝐹|𝐹 grammar 𝐸 → •𝑇, } 𝐸 → •𝑇,
𝑇 → •𝑇 ∗ 𝐹, 𝑇 → •𝑇 ∗ 𝐹,
𝐹 → 𝐸 | id 𝑇 → •𝐹, 𝐼3 = Goto(𝐼0 , 𝐹)= { 𝑇 → •𝐹,
𝐹 → •(𝐸), 𝑇 → 𝐹• 𝐹 → •(𝐸),
𝐹 → •id, } 𝐹 → •id,
} }
𝐼5 = Goto(𝐼0 , id)= {
𝐼1 = Goto(𝐼0 , 𝐸) = { 𝐹 → id• 𝐼7 = Goto(𝐼2 ,∗) = {
𝐸 ′ → 𝐸•, } 𝑇 → 𝑇 ∗ •𝐹 ,
𝐸 → 𝐸• + 𝑇 𝐹 → •(𝐸),
} 𝐹 → •id
}
Canonical Collection of Sets of LR(0) Items LR(0) Automaton

𝐼6 = Goto(𝐼1 , +)= { 𝐼9 = Goto(𝐼6 , 𝑇)= { 𝐼2 = Goto(𝐼4 , 𝑇) • An LR parser makes shift-reduce decisions by maintaining states
𝐸 → 𝐸 + •𝑇, 𝐸 → 𝐸 + 𝑇•, 𝐼3 = Goto(𝐼4 , 𝐹)
𝑇 → •𝑇 ∗ 𝐹, 𝑇 → 𝑇• ∗ 𝐹
𝐼4 = Goto(𝐼4 , "(")
• Canonical LR(0) collection is used for constructing a DFA for parsing
𝑇 → •𝐹, }
𝐹 → •(𝐸), 𝐼5 = Goto(𝐼4 , id) • States represent sets of LR(0) items in the canonical LR(0) collection
𝐹 → •id, 𝐼10 = Goto(𝐼7 , 𝐹)= {
} 𝑇 → 𝑇 ∗ 𝐹•,
𝐼3 = Goto(𝐼6 , 𝐹) • Start state is Closure({𝑆 ′ → •𝑆}), where 𝑆 ′ is the start symbol of the
} 𝐼4 = Goto(𝐼6 , "(") augmented grammar
𝐼8 = Goto(𝐼4 , 𝐸)= { 𝐼5 = Goto(𝐼6 , id) • State 𝑗 refers to the state corresponding to the set of items 𝐼𝑗
𝐸 → 𝐸• + 𝑇, 𝐼11 = Goto(𝐼8 , ")")= { 𝐼4 = Goto(𝐼7 , "(")
𝐹 → (𝐸•) 𝐹 → (𝐸)•
𝐼5 = Goto(𝐼7 , id)
} }
𝐼6 = Goto(𝐼8 , +)
𝐼7 = Goto(𝐼9 ,∗)

Each state is associated with a unique
LR(0) Automaton grammar symbol
Use of LR(0) Automaton
$
Accept 𝐼1 𝑇 𝐼2 𝐹 𝐼3 𝐼5
𝐹 id • How can LR(0) automata help with shift-reduce decisions?
𝐸 id id
id • Suppose string 𝛾 of grammar symbols takes the automaton from start
+ ∗ state 𝑆0 to state 𝑆𝑗
( • Shift on next input symbol 𝑎 if 𝑆𝑗 has a transition on 𝑎
𝐼0 𝐼4 ( + 𝐼6 𝐼7
∗
( • Otherwise, reduce
𝑇 • Items in state 𝑆𝑗 help decide which production to use
( 𝐹
𝐸
𝐼8 𝐼9 𝐼10 𝐼11
)
Structure of LR Parsing Table Constructing LR(0) Parsing Table

• Assume 𝑆𝑖 is top of the stack and 𝑎𝑖 is the current input symbol 1) Construct LR(0) canonical collection 𝐶 = {𝐼0 , 𝐼1 , … , 𝐼𝑛 } for grammar
• Parsing table consists of two parts: an ACTION and a GOTO function 𝐺′
• ACTION table is indexed by state and terminal symbols, ACTION[𝑆𝑖 , 2) State 𝑖 is constructed from 𝐼𝑖
𝑎𝑖 ] can have four values a) If [𝐴 → 𝛼•𝑎𝛽] is in 𝐼𝑖 and Goto(𝐼𝑖 , 𝑎) = 𝐼𝑗 , then set ACTION[𝑖, 𝑎] = “Shift 𝑗”
i. Shift 𝑎𝑖 to the stack, go to state 𝑆𝑗 b) If [𝐴 → 𝛼•] is in 𝐼𝑖 , then set Action [𝑖, 𝑎] = “Reduce 𝐴 → 𝛼” for all 𝑎
ii. Reduce by rule 𝑘 c) If [𝑆 ′ → 𝑆•] is in 𝐼𝑖 , then set Action [𝑖, $] = “Accept”
iii. Accept 3) If Goto(𝐼𝑖 , 𝐴) = 𝐼𝑗 , then GOTO[𝑖, 𝐴] = 𝑗
iv. Error (empty cell in the table) 4) All entries left undefined are “errors”
• GOTO table is indexed by state and nonterminal symbols

LR(0) Parsing Table
ACTION GOTO
Shift-Reduce Parser with LR(0) Automaton
State
id + ∗ ( ) $ 𝐸 𝑇 𝐹 Stack Symbols Input Action
0 𝑠5 𝑠4 1 2 3 0 $ id ∗ id$ Shift
Popped 5,
1 𝑠6 𝑎𝑐𝑐 05 pushed 3 since $id ∗ id$ Reduce by 𝐹 → id
𝐼3 = Goto(𝐼0 , 𝐹)
2 𝑟2 𝑟2 𝑠7, 𝑟2 𝑟2 𝑟2 𝑟2 03 $𝐹 ∗ id$ Reduce by 𝑇 → 𝐹
3 𝑟4 𝑟4 𝑟4 𝑟4 𝑟4 𝑟4 02 $𝑇 ∗ id$ Shift
4 𝑠5 𝑠4 8 2 3 027 $𝑇 ∗ id$ Shift
5 𝑟6 𝑟6 𝑟6 𝑟6 𝑟6 𝑟6 0275 $𝑇 ∗ id $ Reduce by 𝐹 → id
6 𝑠5 𝑠4 9 3 0 2 7 10 $𝑇 ∗ 𝐹 $ Reduce by 𝑇 → 𝑇 ∗ 𝐹
7 𝑠5 𝑠4 10 02 $𝑇 $ Reduce by 𝐸 → 𝑇
8 𝑠6 𝑠11 01 $𝐸 $ Accept
9 𝑟1 𝑟1 𝑠7, 𝑟1 𝑟1 𝑟1 𝑟1
10 𝑟3 𝑟3 𝑟3 𝑟3 𝑟3 𝑟3 While the stack consisted of symbols in the shift-reduce parser,
here the stack contains states from the LR(0) automaton
11 𝑟5 𝑟5 𝑟5 𝑟5 𝑟5 𝑟5
Viable Prefix Example of a Viable Prefix

Stack Input
• Consider 𝐸 → 𝑇 → 𝑇 ∗ 𝐹 → 𝑇 ∗ id → 𝐹 ∗ id → id ∗ id 𝑆 → 𝑋1 𝑋2 𝑋3 𝑋4 $ 𝑋1 𝑋2 𝑋3 $
𝐴 → 𝑋1 𝑋2 $𝑋1 𝑋2 𝑋3 $
• id ∗ is a prefix of a right sentential form, but it can never appear on Let 𝑤 = 𝑋1 𝑋2 𝑋3
$𝑋1 𝑋2 𝑋3 $
the stack $𝐴 𝑋3 $
• Always reduce by 𝐹 → id before shifting ∗ (see previous slide) $𝐴𝑋3 $
• Not all prefixes of a right sentential form can appear on the stack 𝑋1 𝑋2 𝑋3 can never appear on the stack
• A viable prefix is a prefix of a right sentential form that can appear on
the stack of a shift-reduce parser Suppose there is a production 𝐴 → 𝛽1 𝛽2 , and 𝛼𝛽1 is on the stack.
• 𝛼 is a viable prefix if ∃𝑤 such that 𝛼𝑤 is a right sentential form • 𝛽2 ≠ 𝜖 implies the handle 𝛽1 𝛽2 is not at the top of the stack yet, so shift
• There is no error as long as the parser has viable prefixes on the stack • 𝛽2 = 𝜖 implies then the parser can reduce by the handle 𝐴 → 𝛽1

Challenges with LR(0) Parsing Challenges with LR(0) Parsing
• An LR(0) parser works only if each state with a reduce action has only • Consider the following grammar for adding numbers
one possible reduce action and no shift action
{ { 𝑆 →𝑆+𝐸|𝐸 𝑆 →𝐸+𝑆|𝐸
{
𝐿 → 𝐿, 𝑆• 𝐿 → 𝑆, 𝐿• 𝐸 → num 𝐸 → num
𝐿 → 𝐿, 𝑆•
𝑆 → 𝑆•, 𝐿 𝐿 → 𝑆•
} Left associative Right associative
} }
Ok Shift-reduce conflict Reduce-reduce conflict
Not 𝑆 → 𝐸• + 𝑆
𝑆 → 𝐸•
• Takes shift/reduce decisions without any lookahead token LR(0) Shift-reduce conflict
• Lacks the power to parse programming language grammars
Canonical Collection of Sets of LR(0) Items

FIRST 𝑆 = FIRST(𝐸) = {𝐧𝐮𝐦} 𝐼0 = Closure({𝑆 ′ → •S})= { 𝐼3 = Goto(𝐼0 , 𝐧𝐮𝐦)= {
FOLLOW 𝑆 = {$} 𝑆 ′ → •𝑆, 𝐸 → 𝐧𝐮𝐦•
FOLLOW 𝐸 = {+, $} 𝑆 → •𝐸 + 𝑆, }
𝑆 → •𝐸,
𝐸 → •num 𝐼4 = Goto(𝐼2 , +)= {
} 𝑆 → E +•S
}
𝐼1 = Goto(𝐼0 , 𝑆) = {
}
𝑆 ′ → 𝑆• 𝐼2 = Goto(𝐼0 , 𝐸)= {
𝑺 → 𝑬•+S,
Simple LR Parsing
𝑺 → E•
SLR(1)
Not }
LR(0)

Block Diagram of LR Parser LR Parsing Algorithm
Input 𝑎1 … … 𝑎𝑖 … … 𝑎𝑛 $ • The parser driver is same for all LR parsers
• Only the parsing table changes across parsers
• A shift-reduce parser shifts a symbol, and an LR parser shifts a state
Stack 𝑠𝑚
LR Parsing Output
Program • By construction, all transitions to state 𝑗 is for the same symbol 𝑋
𝑠𝑚−1
• Same driver program is used for • Each state, except the start state, has a unique grammar symbol associated
… all LR parsers with it
ACTION GOTO • Different LR parsing techniques
produce different parse tables
$
Parse Table
SLR(1) Parsing Constructing SLR Parsing Table

• Uses LR(0) items and LR(0) automaton, extends LR(0) parser to 1) Construct LR(0) canonical collection 𝐶 = {𝐼0 , 𝐼1 , … , 𝐼𝑛 } for grammar
eliminate a few conflicts 𝐺′
• For each reduction 𝐴 → 𝛽, look at the next symbol 𝑐 2) State 𝑖 is constructed from 𝐼𝑖
∗
• Apply reduction only if 𝑐 ∈ FOLLOW 𝐴 or 𝑐 = 𝜖 and 𝑆 ⇒ 𝛾𝐴 a) If [𝐴 → 𝛼•𝑎𝛽] is in 𝐼𝑖 and Goto(𝐼𝑖 , 𝑎) = 𝐼𝑗 , then set ACTION[𝑖, 𝑎] = “Shift 𝑗”
b) If [𝐴 → 𝛼•] is in 𝐼𝑖 , then set ACTION[𝑖, 𝑎] = “Reduce 𝐴 → 𝛼” for all 𝒂 in
FOLLOW(𝑨)
c) If [𝑆 ′ → 𝑆•] is in 𝐼𝑖 , then set Action [𝑖, $] = “Accept” Constraints on when
reductions are applied
3) If Goto(𝐼𝑖 , 𝐴) = 𝐼𝑗 , then GOTO[𝑖, 𝐴] = 𝑗
4) All entries left undefined are “errors”

SLR Parsing for Expression Grammar Canonical Collection of Sets of LR(0) Items
Rule # Rule • 𝑠𝑗 means shift and stack state 𝑖 𝐼0 = Closure({𝐸 ′ → •𝐸})= { 𝐼2 = Goto(𝐼0 , 𝑇)= { 𝐼4 = Goto(𝐼0 , "(")= {
𝐸 ′ → •𝐸, 𝐸 → 𝑇•, 𝐹 → (•𝐸),
1 𝐸 →𝐸+𝑇 • 𝑟𝑗 means reduce by rule #𝑗 𝐸 → •𝐸 + 𝑇, 𝑇 → 𝑇• ∗ 𝐹 𝐸 → •𝐸 + 𝑇,
𝐸 → •𝑇, } 𝐸 → •𝑇,
2 𝐸→𝑇 • 𝑎𝑐𝑐 means accept 𝑇 → •𝑇 ∗ 𝐹, 𝑇 → •𝑇 ∗ 𝐹,
3 𝑇 →𝑇∗𝐹 𝑇 → •𝐹, 𝐼3 = Goto(𝐼0 , 𝐹)= { 𝑇 → •𝐹,
• blank means error 𝐹 → •(𝐸), 𝑇 → 𝐹• 𝐹 → •(𝐸),
4 𝑇→𝐹 𝐹 → •id, } 𝐹 → •id,
FIRST 𝐸 = FIRST 𝑇 = FIRST 𝐹 = {(, id} } }
5 𝐹 → (𝐸) 𝐼5 = Goto(𝐼0 , id)= {
6 𝐹 → id FOLLOW 𝐸 = {$, +, )} 𝐼1 = Goto(𝐼0 , 𝐸) = { 𝐹 → id• 𝐼7 = Goto(𝐼2 ,∗) = {
𝐸 ′ → 𝐸•, } 𝑇 → 𝑇 ∗ •𝐹 ,
FOLLOW 𝑇 = {$,+, )}
𝐸 → 𝐸• + 𝑇 𝐹 → •(𝐸),
FOLLOW 𝐹 = {$, +,×, )} } 𝐹 → •id
}
LR(0) Automaton
Canonical Collection of Sets of LR(0) Items
$
Accept 𝐼1 𝑇 𝐼2 𝐹 𝐼3 𝐼5
𝐼6 = Goto(𝐼1 , +)= { 𝐼9 = Goto(𝐼6 , 𝑇)= { 𝐼2 = Goto(𝐼4 , 𝑇) 𝐹 id
𝐸 → 𝐸 + •𝑇, 𝐸 → 𝐸 + 𝑇•, 𝐼3 = Goto(𝐼4 , 𝐹) 𝐸 id id
𝑇 → •𝑇 ∗ 𝐹, 𝑇 → 𝑇• ∗ 𝐹 id
𝑇 → •𝐹, } 𝐼4 = Goto(𝐼4 , "(")
+ ∗
𝐹 → •(𝐸), 𝐼5 = Goto(𝐼4 , id)
𝐹 → •id, 𝐼10 = Goto(𝐼7 , 𝐹)= { (
𝐼3 = Goto(𝐼6 , 𝐹) 𝐼0 𝐼4 + 𝐼6 𝐼7
} 𝑇 → 𝑇 ∗ 𝐹•, ( ∗
} 𝐼4 = Goto(𝐼6 , "(") (
𝐼8 = Goto(𝐼4 , 𝐸)= { 𝐼5 = Goto(𝐼6 , id) 𝑇
( 𝐹
𝐸 → 𝐸• + 𝑇, 𝐼11 = Goto(𝐼8 , ")")= { 𝐼4 = Goto(𝐼7 , "(")
𝐹 → (𝐸•) 𝐹 → (𝐸)• 𝐸
𝐼5 = Goto(𝐼7 , id)
} }
𝐼6 = Goto(𝐼8 , +)
𝐼8 𝐼9 𝐼10 𝐼11
𝐼7 = Goto(𝐼9 ,∗)
)
SLR Parsing Table
ACTION GOTO
LR Parser Configurations
State
id + ∗ ( ) $ 𝐸 𝑇 𝐹
0 𝑠5 𝑠4 1 2 3 • A LR parser configuration is a pair <𝑠0 , 𝑠1 ,…, 𝑠𝑚 , 𝑎𝑖 𝑎𝑖+1 … 𝑎𝑛 $>
1 𝑠6 𝑎𝑐𝑐 • Left half is stack content, and right half is the remaining input
2 𝑟2 𝑠7 𝑟2 𝑟2 • Configuration represents the right sentential form 𝑋1 𝑋2 … 𝑋𝑚 𝑎𝑖 𝑎𝑖+1 …
3 𝑟4 𝑟4 𝑟4 𝑟4
𝑎𝑛
4 𝑠5 𝑠4 8 2 3
5 𝑟6 𝑟6 𝑟6 𝑟6
6 𝑠5 𝑠4 9 3
7 𝑠5 𝑠4 10
8 𝑠6 𝑠11
9 𝑟1 𝑠7 𝑟1 𝑟1
10 𝑟3 𝑟3 𝑟3 𝑟3
11 𝑟5 𝑟5 𝑟5 𝑟5
LR Parsing Program
LR Parsing Algorithm Let 𝑎 be the first symbol of input 𝑤$
while (1)
• If ACTION[𝑠𝑚 , 𝑎𝑖 ] = shift 𝑠, new configuration is <𝑠0 , 𝑠1 ,…, 𝑠𝑚 𝑠, 𝑎𝑖+1 … let 𝑠 be the top of the stack
𝑎𝑛 $> if ACTION[𝑎] == shift 𝑡
• If ACTION[𝑠𝑚 , 𝑎𝑖 ] = reduce 𝐴 → 𝛽, new configuration is <𝑠0 , 𝑠1 ,…, push 𝑡 onto the stack
let 𝑎 be the next input symbol
𝑠𝑚−𝑟 , 𝑎𝑖 𝑎𝑖+1 … 𝑎𝑛 $>, where 𝑟 = |𝛽| and 𝑠 = GOTO[𝑠𝑚−𝑟 , 𝐴] else if ACTION[𝑠, 𝑎] == reduce 𝐴 → 𝛽
• If ACTION[𝑠𝑚 , 𝑎𝑖 ] = accept, parsing is successful pop |𝛽| symbols off the stack
push GOTO[𝑡, 𝐴] onto the stack
• If ACTION[𝑠𝑚 , 𝑎𝑖 ] = error, parsing has discovered an error output production 𝐴 → 𝛽
else if ACTION[𝑠, 𝑎] == accept
break
else
invoke error recovery

Moves of an LR Parser on id ∗ id + id Moves of an LR Parser on id ∗ id + id
Stack Symbols Input Action Stack Symbols Input Action
1 0 id ∗ id + id$ Shift 11 0165 𝐸 + id $ Reduce by 𝐹 → id
2 05 id ∗ id + id$ Reduce by 𝐹 → id 12 0163 𝐸+𝐹 $ Reduce by 𝑇 → 𝐹
3 03 𝐹 ∗ id + id$ Reduce by 𝑇 → 𝐹 13 0169 𝐸+𝑇 $ Reduce by 𝐸 → 𝐸 + 𝑇
4 02 𝑇 ∗ id + id$ Shift 14 01 𝐸 $ Accept
5 027 𝑇∗ id + id$ Shift
6 0275 𝑇 ∗ id +id$ Reduce by 𝐹 → id
7 0 2 7 10 𝑇∗𝐹 +id$ Reduce by 𝑇 → 𝑇 ∗ 𝐹
8 02 𝑇 +id$ Reduce by 𝐸 → 𝑇
9 01 𝐸 +id$ Shift
10 016 𝐸+ id$ Shift
Limitations of SLR Parsing Limitations of SLR Parsing

• If an SLR parse table for a grammar does not have multiple entries in
Unambiguous grammar Example Derivation
any cell then the grammar is unambiguous
𝑆 ⇒ 𝐿 = 𝑅 ⇒ ∗𝑅 = 𝑅
𝑆 →𝐿 =𝑅|𝑅
𝐿 → ∗𝑅 | id
• Every SLR(1) grammar is unambiguous, but there are unambiguous
𝑅→𝐿
grammars that are not SLR(1)
FIRST 𝑆 = FIRST 𝐿 = FIRST 𝑅 = ∗, id
FOLLOW 𝑆 = FOLLOW 𝐿 = FOLLOW 𝑅

= =, $

SLR Parsing Table
Canonical LR(0) Collection
ACTION GOTO
𝐼0 = Closure(𝑆 ′ → •𝑆)= { 𝐼3 = Goto(𝐼0 , 𝑅)= { 𝐼5 = Goto(𝐼0 , id)= { State
𝑆 ′ → •𝑆, 𝑆 → 𝑅• 𝐿 → •id = ∗ id $ 𝑆 𝐿 𝑅
𝑆 → •𝐿 = 𝑅, } } 0 𝑠4 𝑠5 1 2 3
𝑆 → •𝑅, 𝐼4 =Goto(𝐼0 , 𝑅)= { 𝐼7 =Goto(𝐼4 , 𝑅)= {
1 𝑎𝑐𝑐
𝐿 → •∗𝑅, 𝐿 → ∗•𝑅, 𝐿 →∗ 𝑅•
𝐿 → •id, 𝑅 → •𝐿, } 2 𝒔𝟔, 𝒓𝟔 𝑟6
𝑅 → •𝐿 𝐿 → •∗𝑅, 𝐼8 = Goto(𝐼4 , 𝐿)= { 3
} 𝐿 → •id 𝑅 → 𝐿•
4 𝑠4 𝑠5 8 7
𝐼1 = Goto(𝐼0 , 𝑆) = { } }
𝑆 ′ → 𝑆• 𝐼6 = Goto(𝐼2 , ‘=‘)= { 𝐼9 = Goto(𝐼6 , 𝑅)= { 5 𝑟5 𝑟5
} 𝑆 → 𝐿 = •𝑅, 𝑆 → 𝐿 = 𝑅• 6 𝑠4 𝑠5 8 9
𝐼2 = Goto(𝐼0 , 𝐿)= { 𝑅 → •𝐿, }
7 𝑟4 𝑟4
𝑺 → 𝑳• = 𝑹, 𝐿 → •∗𝑅,
𝐿 → •id 8 𝑟6 𝑟6
𝑹 → 𝑳•
} } 9 𝑟2
Shift-Reduce Conflict with SLR Parsing Moves of an LR Parser on id=id

𝐼0 = Closure(𝑆 ′ →. 𝑆)= { 𝐼3 = Goto(𝐼0 , 𝑅)= { 𝐼5 = Goto(𝐼0 , id)= {
𝑆 ′ → •𝑆, 𝑆 → 𝑅• 𝐿 → •id Stack Input Action Stack Input Action
𝑆 → •𝐿 = 𝑅, } } 0 id=id$ Shift 5 0 id=id$ Shift 5
𝑆 → •𝑅, 𝐼4 =Goto(𝐼0 , 𝑅)= { 𝐼7 =Goto(𝐼4 , 𝑅)= {
𝐿 → • ∗ 𝑅, 𝐿 →∗ •𝑅, 𝐿 →∗ 𝑅• 0 id 5 =id$ Reduce by 𝐿 → id 0 id 5 =id$ Reduce by 𝐿 → id
1. ACTION[2,=]
𝐿 → •id, = Shift 6, or 𝑅 → •𝐿, }
0𝐿2 =id$ Reduce by 𝑹 → 𝑳 0𝐿2 =id$ Shift 6
𝐿 → • ∗ 𝑅,
}
𝑅 → •𝐿
2. ACTION[2,=] = Reduce 𝑅 → 𝐿 →𝐿•id since "=" ∈ 𝐼FOLLOW(𝑅)
8 = Goto(𝐼4 , 𝐿)= {
𝑅 → 𝐿• 0𝑅3 =id$ Error 0𝐿2=6 id$ Shift 5
𝐼1 = Goto(𝐼0 , 𝑆) = { } }
𝑆 ′ → 𝑆• 𝐼6 = Goto(𝐼2 , ‘=‘)= { 𝐼9 = Goto(𝐼6 , 𝑅)= { 0 𝐿 2 = 6 id 5 $ Reduce by 𝐿 → id
} 𝑆 → 𝐿 = •𝑅, 𝑆 → 𝐿 = 𝑅•
𝐼2 = Goto(𝐼0 , 𝐿)= { 𝑅 → •𝐿, } No right sentential form 0𝐿2=6𝐿8 $ Reduce by 𝑅 → 𝐿
𝑺 → 𝑳• = 𝑹, 𝐿 → •∗ 𝑅, begins with 𝑅 = ⋯
0𝐿2=6𝑅9 $ Reduce by 𝑆 → 𝐿 = 𝑅
𝑹 → 𝑳• 𝐿 → •id
} } 0𝑆1 $ Accept

Moves of an LR Parser on id=id Moves of an LR Parser on id=id
Stack Input Action Stack Input Action Stack Input Action Stack Input Action
0 State 𝑖 calls forid=id$
a reduction
Shift 5 by 𝐴 → 𝛼0 if the set of items
id=id$ 𝐼Shift
𝑖 contains
5 0 id=id$ Shift 5 0 id=id$ Shift 5
0 item
id 5 [𝐴 → 𝛼•] and =id$ 𝑎 ∈ FOLLOW(𝐴)
Reduce by 𝐿 → id 0 id 5 =id$ Reduce by 𝐿 → id 0 id 5 =id$ Reduce by 𝐿 → id 0 id 5 =id$ Reduce by 𝐿 → id
SLR parsers cannot remember the left context
0𝐿2 =id$ Reduce by 𝑅 → 𝐿 0𝐿2 =id$ Shift 6 0𝐿2 =id$ Reduce by 𝑅 → 𝐿 0𝐿2 =id$ Shift 6
• SLR(1) states only tell us about the sequence on top of the stack, n
0 •𝑅 3Suppose 𝛽𝐴 is=id$
a viable
Error prefix on top
0 𝐿 2of
= the
6 stack id$ Shift 5 0𝑅3 =id$ Error 0𝐿2=6 id$ Shift 5
ot what is below on the stack
• There may be no right sentential form 6 id 5 𝑎 follows
0 𝐿 2 =where 𝛽𝐴 by 𝐿 → id
$ Reduce 0 𝐿 2 = 6 id 5 $ Reduce by 𝐿 → id
• Parser should not reduce by 𝐴 → 𝛼
0𝐿2=6𝐿8 $ Reduce by 𝑅 → 𝐿 0𝐿2=6𝐿8 $ Reduce by 𝑅 → 𝐿
0𝐿2=6𝑅9 $ Reduce by 𝑆 → 𝐿 = 𝑅 0𝐿2=6𝑅9 $ Reduce by 𝑆 → 𝐿 = 𝑅
0𝑆1 $ Accept 0𝑆1 $ Accept
LR(1) Item
• An LR(1) item of a CFG 𝐺 is a string of the form [𝐴 → 𝛼•𝛽, 𝑎], with 𝑎
as one symbol lookahead
• 𝐴 → 𝛼𝛽 is a production in 𝐺, and 𝑎 ∈ 𝑇 ∪ {$}
• Suppose [𝐴 → 𝛼•𝛽, 𝑎] where 𝛽 ≠ 𝜖, then the lookahead is not

Canonical LR Parsing required
• If [𝐴 → 𝛼•, 𝑎], reduce only if next input symbol is 𝑎
• Set of possible terminals will always be a subset of FOLLOW(𝐴), but can be a
proper subset

LR(1) Item Constructing LR(1) Sets of Items
Input … … 𝑤 $
• An LR(1) item [𝐴 → 𝛼•𝛽, 𝑎] is Closure(𝐼) Goto(𝐼, 𝑋)

valid for a viable prefix 𝛾 if there
is a derivation repeat initialize 𝐽 to be the empty set
LR Parsing
Stack 𝛽 for each item [𝐴 → 𝛼•𝐵𝛽, 𝑎] in 𝐼 for each item [𝐴 → 𝛼•𝑋𝛽, 𝑎] in 𝐼
𝑆 ∗
𝛿𝐴𝑤 𝛿𝛼𝛽𝑤 Program
𝑟𝑚 𝑟𝑚 𝛼 for each production 𝐵 → 𝛾 in 𝐺 ′ add item [𝐴 → 𝛼𝑋•𝛽, 𝑎] to set 𝐽
for each terminal 𝑏 in FIRST(𝛽𝑎) return Closure(𝐽)
where 𝛿
add [𝐵 → •𝛾, 𝑏] to set 𝐼
i. 𝛾 = 𝛿𝛼, and … until no more items are added to 𝐼
ii. 𝑎 is the first symbol in 𝑤, or, $ return 𝐼
𝑤 = 𝜖 and 𝑎 = $
Constructing LR(1) Sets of Items Example Construction of LR(1) Items

Rule # Production 𝐼0 = Closure([𝑆 ′ → •𝑆, $])= {
Items(𝐺 ′ ): 𝑆 ′ → •𝑆, $,
0 𝑆′ → 𝑆 𝑆 → •𝐶𝐶, $,
C = Closure({[𝑆 ′ → •𝑆, $]}) 1 𝑆 → 𝐶𝐶 𝐶 → •c𝐶, c/𝐝,
𝐶 → •d, 𝐜/𝐝
repeat 2 𝐶 → c𝐶
}
3 𝐶→d
for each set of items 𝐼 in 𝐶 𝐼1 = Goto(𝐼0 , 𝑆) = {
for each grammar symbol 𝑋 𝑆 ′ → 𝑆•, $
generates the regular }
if Goto(𝐼, 𝑋) ≠ 𝜙 and Goto(𝐼, 𝑋) ∉ 𝐶 language c ∗ dc∗ d
add Goto(𝐼, 𝑋) to 𝐶
until no new sets of items are added to 𝐶

Example Construction of LR(1) Items LR(1) Automaton
𝐼0 = Closure([𝑆 ′ →. 𝑆, $])= { 𝐼3 = Goto(𝐼0 , 𝐜)= { 𝐼6 = Goto(𝐼2 , 𝐜)= {
𝑆 ′ → •𝑆, $, 𝐶 → 𝐜•𝐶, 𝐜/𝐝, 𝐶 → 𝐜•𝐶, $, 𝐼1 𝐼4 𝐶 𝐼5
𝑆 → •𝐶𝐶, $, 𝐶 → •𝐜𝐶, 𝐜/𝐝, 𝐶 → •𝐜𝐶, $, 𝑑
𝐶 → •𝐜𝐶, 𝐜/𝐝, 𝐶 → •𝐝, 𝐜/𝐝 𝐶 → •𝐝, $ 𝑆
𝐶 → •𝐝, 𝐜/𝐝 } } 𝑐
} 𝑐
𝐼4 = Goto(𝐼0 , 𝑑) = { 𝐼7 = Goto(𝐼2 , 𝐝) = {
𝐼1 = Goto(𝐼0 , 𝑆) = { 𝐶 → 𝐝•, 𝐜/𝐝 𝐶 → 𝐝•, $ 𝐶
𝑆 ′ → 𝑆•, $ } } 𝐼0 𝐼2 𝐼3 𝐼6
} 𝑐
𝐼5 = Goto(𝐼2 , 𝐶) = { 𝐼8 = Goto(𝐼3 , 𝐶) = { 𝑐
𝐼2 = Goto(𝐼0 , 𝐶)= { 𝐶 → 𝐶𝐶•, $ 𝐶 → 𝐜𝐶•, 𝐜/𝐝 𝐶
𝑆 → 𝐶•𝐶, $, } } 𝐶
𝐶 → •𝐜𝐶, $,
𝐶 → •𝐝, $ 𝐼9 = Goto(𝐼6 , 𝐶)= {
} 𝐶 → 𝐜𝐶•, $ 𝐼7 𝐼8 𝐼9
}
Construction of Canonical LR(1) Parsing Tables Canonical LR(1) Parsing Table

ACTION GOTO
• Construct 𝐶 ′ = {𝐼0 , 𝐼1 , … , 𝐼𝑛 } State
𝐜 𝐝 $ 𝑆 𝐶
• State 𝑖 of the parser is constructed from 𝐼𝑖 0 𝑠3 𝑠4 1 2
• If [𝐴 → 𝛼•𝑎𝛽, 𝑏] is in 𝐼𝑖 and Goto(𝐼𝑖 ,𝑎) = 𝐼𝑗 , then set ACTION[𝑖, 𝑎]=“shift 𝑗” 1 𝑎𝑐𝑐
• If [𝐴 → 𝛼•, 𝑎] is in 𝐼𝑖 , 𝐴 ≠ 𝑆 ′ , then set ACTION[𝑖,𝑎]=“reduce 𝐴 → 𝛼•” 2 𝑠6 𝑠7 5
• If [𝑆 ′ → 𝑆•, $] is in 𝐼𝑖 , then set ACTION[𝑖,$]=“accept” 3 𝑠3 𝑠4 8
4 𝑟3 𝑟3
• If Goto(𝐼𝑖 , 𝐴)= 𝐼𝑗 , then GOTO[𝑖, 𝐴] = 𝑗 5 𝑟1
• Initial state of the parser is constructed from the set of items 6 𝑠6 𝑠7 9
containing [𝑆 ′ → •𝑆, $] 7 𝑟3
8 𝑟2 𝑟2
9 𝑟2

Moves of a CLR Parser on 𝐜𝐝𝐜𝐝 Canonical LR(1) Parsing
Stack Symbols Input Action
1 0 cdcd$ Shift
• If the parsing table has no multiply-defined cells, then the
2 03 c dcd$ Shift
corresponding grammar 𝐺 is LR(1)
3 034 𝐜𝐝 cd$ Reduce by 𝐶 → 𝐝
4 038 c𝐶 cd$ Reduce by 𝐶 → c𝐶

• Every SLR(1) grammar is an LR(1) grammar
5 02 𝐶 cd$ Shift
• Canonical LR parser may have more states than SLR
6 026 𝐶c d$ Shift
7 0267 𝐶cd $ Reduce by 𝐶 → d
8 0269 𝐶c𝐶 $ Reduce by 𝐶 → c𝐶
9 025 𝐶𝐶 $ Reduce by 𝑆 → 𝐶𝐶
10 01 𝑆 $ Accept
Example Construction of LR(1) Items

𝐼0 = Closure([𝑆 ′ →. 𝑆, $])= { 𝑰𝟑 = Goto(𝑰𝟎 , 𝒄)= { 𝑰𝟔 = Goto(𝑰𝟐 , 𝒄)= {
𝑆 ′ → •𝑆, $, 𝑪 → 𝒄•𝑪, 𝒄/𝒅, 𝑪 → 𝒄•𝑪, $,
𝑆 → •𝐶𝐶, $, 𝑪 → •𝒄𝑪, 𝒄/𝒅, 𝑪 → •𝒄𝑪, $,
𝐶 → •𝑐𝐶, 𝑐/𝑑, 𝑪 → •𝒅, 𝒄/𝒅 𝑪 → •𝒅, $
𝐶 → •𝑑, 𝑐/𝑑 } }
}
𝑰𝟒 = Goto(𝑰𝟎 , 𝒅) = { 𝑰𝟕 = Goto(𝑰𝟐 , 𝒅) = {
𝐼1 = Goto(𝐼0 , 𝑆) = { 𝑪 → 𝒅•, 𝒄/𝒅 𝑪 → 𝒅•, $
𝑆 ′ → 𝑆•, $ } }
LALR Parsing }
𝐼2 = Goto(𝐼0 , 𝐶)= {
𝐼5 = Goto(𝐼2 , 𝐶) = {
𝐶 → 𝐶𝐶•, $
𝑰𝟖 = Goto(𝑰𝟑 , 𝑪) = {
𝑪 → 𝒄𝑪•, 𝒄/𝒅
𝑆 → 𝐶•𝐶, $, } }
𝐶 → •𝑐𝐶, $,
𝐶 → •𝑑, $ 𝑰𝟗 = Goto(𝑰𝟔 , 𝑪)= {
} 𝐼3 and 𝐼6 , 𝐼4 and 𝐼7 , and 𝐼8 and 𝐼9 𝑪 → 𝒄𝑪•, $
}
only differ in the second components
Lookahead LR (LALR) Parsing Construction of LALR Parsing Table
• CLR(1) parser has a large number of states • Construct 𝐶 = {𝐼0 , 𝐼1 , … , 𝐼𝑛 }, the collection of sets of LR(1) items
• Lookahead LR (LALR) parser • For each core present in LR(1) items, find all sets having the same
• Merge sets of LR(1) items that have the same core (set of LR(0) items, i.e., core and replace these sets by their union
first component) • Let 𝐶 ′ = {𝐽0 , 𝐽1 , … , 𝐽𝑛 } be the resulting sets of LR(1) items (also called
• LALR parsers have fewer states, same as SLR LALR collection)
• LALR parser is used in many parser generators (e.g., Yacc and Bison) • Construct ACTION table as was done earlier, parsing actions for state 𝑖
is constructed from 𝐽𝑖
• Let 𝐽 = 𝐼1 ∪ 𝐼2 ∪ ⋯ ∪ 𝐼𝑘 , where the cores of 𝐼1 , 𝐼2 , … , 𝐼𝑘 are same
• Cores of Goto(𝐼1 , 𝑋), Goto(𝐼2 , 𝑋), …, Goto(𝐼𝑘 , 𝑋) will also be the same
• Let 𝐾 = Goto(𝐼1 , 𝑋) ∪ Goto(𝐼2 , 𝑋) ∪ … ∪ Goto(𝐼𝑘 , 𝑋), then Goto(𝐽,𝑋) = 𝐾
LALR Grammar LALR Parsing Table

• If there are no parsing action conflicts, then the grammar is LALR(1) ACTION GOTO
State
𝑐 𝑑 $ 𝑆 𝐶
0 𝑠36 𝑠47 1 2
Rule # Production 𝐼36 = Goto(𝐼0 , 𝑐)= { 𝐼89 = Goto(𝐼3 , 𝐶) = { 1 𝑎𝑐𝑐

𝐶 → 𝑐•𝐶, 𝑐/𝑑/$, 𝐶 → 𝑐𝐶•, 𝑐/𝑑/$ 2 𝑠36 𝑠47 5
0 𝑆′ → 𝑆
𝐶 → •𝑐𝐶, 𝑐/𝑑/$, } 36 𝑠36 𝑠47 89
1 𝑆 → 𝐶𝐶 𝐶 → •𝑑, 𝑐/𝑑/$
} 47 𝑟3 𝑟3 𝑟3
2 𝐶 → 𝑐𝐶
5 𝑟1
3 𝐶→𝑑 𝐼47 = Goto(𝐼0 , 𝑑) = {
89 𝑟2 𝑟2 𝑟2
𝐶 → 𝑑•, 𝑐/𝑑/$
}

Moves of a LALR Parser on 𝐜𝐝𝐜𝐝 Notes on LALR Parsing Table
Stack Symbols Input Action
1 0 cdcd$ Shift
• LALR parser behaves like the CLR parser excepting difference in stack
2 0 36 c dcd$ Shift
states
3 0 36 47 𝐜𝐝 cd$ Reduce by 𝐶 → 𝐝
• Merging LR(1) items can never produce shift/reduce conflicts
4 0 36 89 c𝐶 cd$ Reduce by 𝐶 → c𝐶
• Suppose there is a shift-reduce conflict on lookahead 𝑎 due to items [𝐵 →
5 02 𝐶 cd$ Shift 𝛽•𝑎𝛾, 𝑏] and [𝐴 → 𝛼•, 𝑎]
6 0 2 36 𝐶c d$ Shift • But merged state was formed from states with same cores, which implies
[𝐵 → 𝛽•𝑎𝛾, 𝑐] and [𝐴 → 𝛼•, 𝑎] must have already been in the same state, for
7 0 2 36 47 𝐶cd $ Reduce by 𝐶 → d
some value of 𝑐
8 0 2 36 89 𝐶c𝐶 $ Reduce by 𝐶 → c𝐶
• Merging items may produce reduce/reduce conflicts
9 025 𝐶𝐶 $ Reduce by 𝑆 → 𝐶𝐶
10 01 𝑆 $ Accept
# Production
Dealing with Errors with LALR Parsing 0 𝑆′ → 𝑆
Reduce-Reduce Conflicts due to Merging 1

2
𝑆 → 𝐶𝐶
𝐶 → c𝐶
• Consider an erroneous input ccd 3 𝐶→d
LR(1) grammar { 𝐴 → 𝑐•, 𝑑 , [𝐵 → 𝑐•, 𝑒]} { 𝐴 → 𝑐•, 𝑒 , [𝐵 → 𝑐•, 𝑑]} CLR Parsing Table LALR Parsing Table
Action Goto Action Goto
𝑆′ → 𝑆 State
State
𝑆 → 𝑎𝐴𝑑 | 𝑏𝐵𝑑|𝑎𝐵𝑒|𝑏𝐴𝑒 0 𝑠3 𝑠4 1 2 0 𝑠36 𝑠47 1 2
𝐴→𝑐 { 𝐴 → 𝑐•, 𝑑/𝑐 , [𝐵 → 𝑐•, 𝑑/𝑒]} 1 𝑎𝑐𝑐 1 𝑎𝑐𝑐
𝐵→𝑐 2 𝑠6 𝑠7 5 2 𝑠36 𝑠47 5
3 𝑠3 𝑠4 8 36 𝑠36 𝑠47 89
𝑎𝑐𝑑, 𝑎𝑐𝑒, 𝑏𝑐𝑑, 𝑏𝑐𝑒 4 𝑟3 𝑟3 47 𝑟3 𝑟3 𝑟3
5 𝑟1 5 𝑟1
6 𝑠6 𝑠7 9 89 𝑟2 𝑟2 𝑟2
7 𝑟3
8 𝑟2 𝑟2
9 𝑟2

Comparing Moves of CLR and LALR Parsers Comparing Moves of CLR and LALR Parsers
• Consider an erroneous input ccd • Consider an erroneous input ccd
CLR Parsing Table LALR Parsing Table CLR Parsing Table LALR Parsing Table
Stack Symbols Input Action Stack Symbols Input Action Stack Symbols Input Action Stack Symbols Input Action
0 ccd$ Shift 0 ccd$ Shift 0• CLR parser will
ccd$notShift
even reduce before
0 reporting anccd$
error
Shift
03 c cd$ Shift 0 36 c cd$ Shift 03 c cd$ Shift 0 36 c cd$ Shift
033 𝐜𝐜 d$ Shift 0 36 36 𝐜𝐜 d$ Shift

• SLR and
033 𝐜𝐜
LALR parsers
d$ Shift
may reduce0 36several
36
times
𝐜𝐜
before reporting an
d$ Shift
error, but will never shift an erroneous input symbol onto the stack
0334 ccd $ Error 0 36 36 47 ccd $ Reduce by 𝐶 → d 0334 ccd $ Error 0 36 36 47 ccd $ Reduce by 𝐶 → d
0 36 36 89 cc𝐶 $ Reduce by 𝐶 → c𝐶 0 36 36 89 cc𝐶 $ Reduce by 𝐶 → c𝐶
0 36 89 c𝐶 $ Reduce by 𝐶 → c𝐶 0 36 89 c𝐶 $ Reduce by 𝐶 → c𝐶
02 𝐶 $ Error 02 𝐶 $ Error
Dealing with Ambiguous Grammars

𝐼2 = Goto(𝐼0 , ′(′)= { 𝐼5 = Goto(𝐼0 , ‘∗’) = {
𝐸 → (•𝐸), 𝐸 → 𝐸 ∗ •𝐸,
𝐸′ → 𝐸 𝐸 → •𝐸 + 𝐸, 𝐸 → •𝐸 + 𝐸,
𝐸 →𝐸+𝐸 𝐸∗𝐸 𝐸 | id 𝐸 → •𝐸 ∗ 𝐸, 𝐸 → •𝐸 ∗ 𝐸,
𝐸 → •(𝐸), 𝐸 → •(𝐸),
𝐸 → •id 𝐸 → •id
𝐼0 = Closure({𝐸 ′ → •𝐸})= { } }
𝐸 ′ → •𝐸, 𝐼3 = Goto(𝐼0 , id)= { 𝐼6 = Goto(𝐼2 , 𝐸) = {
𝐸 → •𝐸 + 𝐸, 𝐸 → id• 𝐸 → (𝐸•),
𝐸 → •𝐸 ∗ 𝐸, } 𝐸 → 𝐸• + 𝐸,
𝐸 → •(𝐸), 𝐼4 = Goto(𝐼0 , ‘+’) = { 𝐸 → 𝐸• ∗ 𝐸,
𝐸 → 𝐸 + •𝐸, }
Using Ambiguous Grammars }
𝐸 → •id
𝐼1 = Goto(𝐼0 , 𝐸) = {
𝐸 → •𝐸 + 𝐸,
𝐸 → •𝐸 ∗ 𝐸,
𝐸 → •(𝐸),
𝐼7 = Goto(𝐼4 , 𝐸) = {
𝑬 → 𝑬 + 𝑬•,
𝑬 → 𝑬• + 𝑬,
𝐸 ′ → 𝐸•,
𝐸 → 𝐸• + 𝐸, 𝐸 → •id 𝑬 → 𝑬• ∗ 𝑬
𝐸 → 𝐸• ∗ 𝐸 } }
} 𝐼9 = Goto(𝐼6 , ‘)’)= { 𝐼8 = Goto(𝐼5 , 𝐸) = {
𝐸 → (𝐸)• 𝑬 → 𝑬 ∗ 𝑬•,
}
Does not specify the associativity and 𝑬 → 𝑬• + 𝑬,
precedence of the two operators 𝑬 → 𝑬• ∗ 𝑬
}
SLR(1) Parsing Table Moves of an SLR Parser on id + id ∗ id
ACTION GOTO
State
id + ∗ ( ) $ 𝐸 Stack Symbols Input Action
0 𝑠3 𝑠2 1 1 0 id + id ∗ id$ Shift 3
1 𝑠4 𝑠5 𝑎𝑐𝑐 2 03 id +id ∗ id$ Reduce by 𝐸 → id
2 𝑠3 𝑠2 6
3 01 𝐸 +id ∗ id$ Shift 4
3 𝑟4 𝑟4 𝑟4 𝑟4
4 014 𝐸+ id ∗ id$ Shift 3
4 𝑠3 𝑠2 7
5 0143 𝐸 + id ∗ id$ Reduce by 𝐸 → id
5 𝑠3 𝑠2 8
6 𝑠4 𝑠5 𝑠9 6 0147 𝐸+𝐸 ∗ id$
7 𝑠4, 𝑟1 𝑠5, 𝑟1 𝑟1 𝑟1
8 𝑠4, 𝑟2 𝑠5, 𝑟2 𝑟2 𝑟2
What can the parser do to
resolve the ambiguity?
9 𝑟3 𝑟3 𝑟3 𝑟3
SLR(1) Parsing Table

Action Goto
State
id + ∗ ( ) $ 𝐸
0 𝑠3 𝑠2 1
1 𝑠4 𝑠5 𝑎𝑐𝑐
2 𝑠3 𝑠2 6
3 𝑟4 𝑟4 𝑟4 𝑟4
4 𝑠3
5 Why did 𝑠3
the parser make these choices?
𝑠2
𝑠2
7
8 Summary
6 𝑠4 𝑠5 𝑠9
7 𝑠4, 𝒓𝟏 𝒔𝟓, 𝑟1 𝑟1 𝑟1
8 𝑠4, 𝒓𝟐 𝑠5, 𝒓𝟐 𝑟2 𝑟2
9 𝑟3 𝑟3 𝑟3 𝑟3

Comparison across LR Parsing Techniques Summary
• SLR(1) = LR(0) items + FOLLOW • Bottom-up parsing is a more powerful technique compared to top-
• SLR(1) parsers can parse a larger number of grammars than LR(0) down parsing
• Any grammar that can be parsed by an LR(0) parser can be parsed by an • LR grammars can handle left recursion
SLR(1) parser • Detects errors as soon as possible, and allows for better error recovery
• Automated parser generators such as Yacc and Bison implement LALR
• SLR(1) ≤ LALR(1) ≤ LR(1) parsing
• SLR(k) ≤ LALR(k) ≤ LR(k)
• LL(k) ≤ LR(k)
• Ambiguous grammars are not LR
References
• A. Aho et al. Compilers: Principles, Techniques, and Tools, 2nd edition, Chapter 4.5-4.8.
• K. Cooper and L. Torczon. Engineering a Compiler, 2nd edition, Chapter 3.4.

5 Compiler - Slide

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

5 Compiler - Slide

Uploaded by

Copyright:

Available Formats

Rightmost Derivation of 𝑎𝑏𝑏𝑐𝑑𝑒

𝑆 → 𝑎𝐴𝐵𝑒 Input string: 𝑎𝑏𝑏𝑐𝑑𝑒

CS 335: Bottom-up Parsing → 𝑎𝐴𝑏𝑐𝑑𝑒

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Shift-Reduce Parsing Shift-Reduce Parsing

Stack Input Action 𝑆 𝑆

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Handle on Top of the Stack Shift-Reduce Actions

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

General shift-reduce technique Which action do you

• Bottom up parsing is essentially the process of detecting handles and

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Stack Input Action Stack Input Action

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

LR(k) Parsing Popularity of LR Parsing

Works for a superset of grammars parsed with predictive or LL parsers

• Remember the basic questions: when to shift and when to reduce!

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Example of Goto Canonical Collection of Sets of LR(0) Items

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Canonical Collection of Sets of LR(0) Items LR(0) Automaton

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Structure of LR Parsing Table Constructing LR(0) Parsing Table

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Viable Prefix Example of a Viable Prefix

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Canonical Collection of Sets of LR(0) Items

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

SLR(1) Parsing Constructing SLR Parsing Table

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

2 05 id ∗ id + id$ Reduce by 𝐹 → id 12 0163 𝐸+𝐹 $ Reduce by 𝑇 → 𝐹

3 03 𝐹 ∗ id + id$ Reduce by 𝑇 → 𝐹 13 0169 𝐸+𝑇 $ Reduce by 𝐸 → 𝐸 + 𝑇

4 02 𝑇 ∗ id + id$ Shift 14 01 𝐸 $ Accept

5 027 𝑇∗ id + id$ Shift

6 0275 𝑇 ∗ id +id$ Reduce by 𝐹 → id

7 0 2 7 10 𝑇∗𝐹 +id$ Reduce by 𝑇 → 𝑇 ∗ 𝐹

10 016 𝐸+ id$ Shift

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Limitations of SLR Parsing Limitations of SLR Parsing

FOLLOW 𝑆 = FOLLOW 𝐿 = FOLLOW 𝑅

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Shift-Reduce Conflict with SLR Parsing Moves of an LR Parser on id=id

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

0𝐿2=6𝑅9 $ Reduce by 𝑆 → 𝐿 = 𝑅 0𝐿2=6𝑅9 $ Reduce by 𝑆 → 𝐿 = 𝑅

0𝑆1 $ Accept 0𝑆1 $ Accept

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

• Suppose [𝐴 → 𝛼•𝛽, 𝑎] where 𝛽 ≠ 𝜖, then the lookahead is not

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

• An LR(1) item [𝐴 → 𝛼•𝛽, 𝑎] is Closure(𝐼) Goto(𝐼, 𝑋)

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

Constructing LR(1) Sets of Items Example Construction of LR(1) Items

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas

CS 335 Swarnendu Biswas CS 335 Swarnendu Biswas