You are on page 1of 24

# MELJUN P.

CORTES, MBA,MPA,BSCS,ACS Fall 2008
CSC 3130: Automata theory and formal languages

LR(k) grammars

Andrej Bogdanov
http://www.cse.cuhk.edu.hk/~andrejb/csc3130

LR(0) example from last time
a 1 A  •aAb A •ab A 2 A  a•Ab A  a•b A  •aAb A  •ab 4 A  aA•b b

a

5 A  aAb• b

3 A  ab•

A  aAb | ab

LR(0) parsing example revisited
A Stack 1 S 1a2 S 1a2a2 S R S R 1a2a2b3 1a2A4 1a2A4b5 1A Input aabb abb bb b b   S 1 2 2
1 A  •aAb A •ab 3 A  ab• 4 A  aA•b b A b a a 2 A  a•Ab A  a•b A  •aAb A  •ab 5 A  aAb•

3 4 5

A•
• a • A• b • a • b•

A  aAb | ab

A  aAb  aabb

Meaning of LR(0) items
NFA transitions to:
A a • X

X  •g
b shift focus to subtree rooted at X (if X is nonterminal)

focus

A  a•Xb

A  aX•b
move past subtree rooted at X

Outline of LR(0) parsing algorithm
• Algorithm can perform two actions:
no complete item is valid there is one valid item, and it is complete

shift (S) • What if:
some valid items complete, some not

reduce (R)

more than one valid complete item

S / R conflict

R / R conflict

Definition of LR(0) grammar
• A grammar is LR(0) if S/R, R/R conflicts never occur
– LR means parsing happens left to right and produces a rightmost derivation

• LR(0) grammars are unambiguous and have a fast parsing algorithm

• Unfortunately, they are not “expressive” enough to describe programming languages

Hierarchy of context-free grammars
context-free grammars
parse using CYK algorithm (slow)

LR(∞) grammars

LR(1) grammars LR(0) grammars
parse using LR(0) algorithm java perl python …

A grammar that is not LR(0)
S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6)

input: a

A grammar that is not LR(0)
S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6)

input: a possibilities:
shift (3), reduce (4) reduce (5), shift (6)

S
A A a•a A a

S A A a• a

S
B

valid LR(0) items:
A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a

a•c

S/R, R/R conflicts!

S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6)

input: a
peek inside!

S
A A a•a A a

S A A a• a

S
B

valid LR(0) items:
A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a

a•c

S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) S A A a•a parse tree must look like this

input: a a
peek inside!

valid LR(0) items:
A  a•A, A  a• B  a•, B  a•b, A  •aA, A  •a

action: shift

S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) S A A A a a• … parse tree must look like this

input: a a a
peek inside!

valid LR(0) items:
A  a•A, A  a• A  •aA, A  •a

action: shift

S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6) S A A A a a a• parse tree must look like this

input: a a a

valid LR(0) items:
A  a•A, A  a• A  •aA, A  •a

action: reduce

LR(0) items vs. LR(1) items
LR(0)
A A a a •A b a b b a

LR(1)

A A a •A b a b b

A  a•Ab

[A  a•Ab, b] A  aAb | ab

LR(1) items
• LR(1) items are of the form [A  a•b, x] or [A  a•b, ]

to represent this state in the parsing

A a • b x

A a • b

Outline of LR(1) parsing algorithm
• Step 1: Build NFA that describes valid item updates
• Step 2: Convert NFA to DFA
– As in LR(0), DFA will have shift and reduce states

• Step 3: Run DFA on input, using stack to remember sequence of states
– Use lookahead to eliminate wrong reduce items

Recall NFA transitions for LR(0)
• States of NFA will be items (plus a start state q0)
• For every item S  •a we have a transition
q0  S  •a

• For every item A  a•Xb we have a transition
A  a•Xb X A  aX•b

• For every item A  a•Cb and production C  •d
A  a•Cb  C  •d

NFA transitions for LR(1)
• For every item [S  •a, ] we have a transition
q0  [S  •a, ]

• For every item A  a•Xb we have a transition
[A  a•Xb, x] X [A  aX•b, x]

• For every item [A  a•Cb, x] and production C  d
[A  a•Cb, x]

[C  •d, y]

for every y in FIRST(bx)

FIRST sets
FIRST(a) is the set of terminals that occur on the left in some derivation starting from a • Example
S  A(1) | cB(2) A  aA(3) | a(4) B  a(5) | ab(6) FIRST(a) = {a} FIRST(A) = {a} FIRST(S) = {a, c} FIRST(bAc) = {b} FIRST(BA) = {a} FIRST() = 

Explaining the transitions
A a• X b x a A X• b x

[A  a•Xb, x]

X

[A  aX•b, x]

A a• C

C b

b x

•d y  [C  •d, y] y ∈ FIRST(bx)

[A  a•Cb, x]

Example
S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6)
[S  •A, ] [S  A•, ]

A
 

[A  •aA, ] [A  •a, ]

 q0 

...
[S  B•c, ]

B [S  •Bc, ]
 

[B  •a, c]
[B  •ab, c]

Convert NFA to DFA
• Each DFA state is a subset of LR(1) items, e.g.
[A  a•A, ] [A  a•, ] [B  a•, c] [B  a•b, c] [A  •aA, ] [A  •a, ]

• States can contain S/R, R/R conflicts • But lookahead can always resolve such conflicts

Example
A

S  A(1) | Bc(2) A  aA(3) | a(4) B  a(5) | ab(6)

stack input valid items [S  •A, ] [S  •Bc, ] [A  •aA, ] abc 
[A  •a, ] [B  •a, c] [B  •ab, c]

S S R S R

a

bc

[A  a•A, ] [A  a•, ] [B  a•, c] [B  a•b, c] [A  •aA, ] [A  •a, ] [B  ab•, c] [S  B•c, ] [S  Bc•, ]

ab B Bc S

c c  

LR(k) grammars
• A context-free grammar is LR(1) if all S/R, R/R conflicts can be resolved with one lookahead • More generally, LR(k) grammars can resolve all conflicts with k lookahead symbols
– Items have the form [A  a•b, x1...xk]

• LR(1) grammars describe the semantics of most programming languages