Professional Documents
Culture Documents
COP CD Unit2 PDF
COP CD Unit2 PDF
Mr. Prakash C O
Asst. Professor,
Dept. of CSE, PESU
coprakasha@pes.edu
Syntax Analysis
➢ The parser obtains a string of tokens from the lexical analyzer, as shown in
Fig. 4.1, and verifies that the string of token names can be generated by the
grammar for the source language.
The Role of the Parser
1. Universal,
2. Top-down, and
3. Bottom-up.
➢ Bottom-up methods build parse trees from the leaves and work their
way up to the root.
➢ In either case, the input to the parser is scanned from left to right, one
symbol at a time.
Fig: Parse Tree
Grammar:
The Role of the Parser
LL grammar and LR grammar
LL grammar: LR grammar:
E → T E' E → E + T | T
E' → + T E'| ϵ T → T * F | F
T → F T' F → (E) | id
T' → * F T'| ϵ
The Role of the Parser
LL grammar and LL parser
➢ The LL parser reads input text Left to right within each line, and top to bottom
across the lines of the full input file.
The second L in LL means that the parser produces a Leftmost derivation: it
does a top-down parse.
➢ Top-down parser build parse trees from the top (root) to the bottom (leaves).
➢ The LR parser reads input text Left to right within each line, and top to bottom
across the lines of the full input file.
The R means that the parser produces a Rightmost derivation in reverse: it does
a bottom-up parse.
➢ Bottom-up parser build parse trees from the leaves and work their way up to
the root.
LL parsing, also known as top-down parsing LR parsing, also known as bottom-up parsing
LL starts with only the root non terminal on the LR ends with only the root non terminal on the
stack stack.
LL reads terminal when it pops one off the stack LR reads terminals while it pushes them on the
stack.
LL parsers are often called “predictive parsers” LR parsers are often called “shift-reduce parsers.
The Role of the Parser
E -> E + T | T
T -> T * F | F (4.1)
F -> (E) | id
The above expression grammar belongs to the class of LR grammars
that are suitable for bottom-up parsing.
This grammar can be adapted to handle additional operators and
additional levels of precedence.
Representative Grammars
➢ The following non-left-recursive variant of the expression grammar (4.1) will
be used for top-down parsing:
E → T E' E -> E + T | T
T' → * F T'| ϵ
E → E + E | E * E | ( E ) | id (4.3)
Here, E represents expressions of all types. Grammar (4.3) permits more than
one parse tree for expressions like a + b*c.
Syntax Analysis
Few languages have been designed with error handling in mind, even
though errors are so commonplace.
➢ Planning the error handling right from the start can both simplify the
structure of a compiler and improve its handling of errors.
Syntax Error Handling
Common programming errors can occur at many different levels.
➢ Example: the return of a value in a Java method with result type void.
syntax errors.
Syntax Error Handling
Viable-prefix Property
unnecessary input
➢ How: detect an error as soon as the prefix of the input does not match a
challenging to realize:
errors.
➢ At the very least, it must report the place in the source program where an
error is detected, because there is a good chance that the actual error
➢ A common strategy is to print the offending line with a pointer to the position
Error-Recovery Strategies
Error-Recovery Strategies
➢ Once an error is detected, how should the parser recover/react?
In the first case, the user must recompile from scratch after possibly a
trivial fix.
➢ Error recovery:
➢ Possible adjustments:
➢ delete tokens
➢ insert tokens
➢ substitute tokens
Error-Recovery Strategies
1. Panic-Mode Recovery
2. Phrase-Level Recovery
3. Error Productions
4. Global Correction
Error-Recovery Strategies
1. Panic-Mode Recovery
This is a crude method but often turns out to be the best method.
In situations where multiple errors in the same statements are rare, this
method may be quite adequate.
Error-Recovery Strategies
2. Phrase-Level Recovery (Statement Mode recovery)
➢ The parser can then generate appropriate error diagnostics about the
erroneous construct that has been recognized in the input.
Error-Recovery Strategies
3. Error Productions Cont…
➢ If we have an idea of common errors that might occur, we can include the
errors in the grammar at hand.
Here, the last two are error situations. Now, we change the grammar as:
E → +E | -E | *A | /A A → E
Hence, once it encounters *A, it sends an error message asking the user if
he is sure he wants to use a unary “*”.
If this is used then, during parsing appropriate error messages can be generated and
parsing can be continued.
Error-Recovery Strategies
4. Global Correction
➢ Given an incorrect input string x and grammar G, these algorithms will find
a parse tree for a closest error-free string y, such that the number of
insertions, deletions, and changes of tokens required to transform x into y is
as small as possible.
➢ These methods are in general too costly to implement in terms of time and
space, so these techniques are currently only of theoretical interest.
Syntax Analysis
Context-Free Grammars
Context-Free Grammars
➢ Grammars are used to specify the syntax of a language.
➢ Set of nonterminals
➢ Set of productions
➢ Example 1:
Context-Free Grammars
➢ Example 2:
➢ Derivation:
➢ given the grammar (i.e. productions)
➢ begin with the start symbol
➢ repeatedly replacing nonterminal by the body
➢ We obtain the language defined by the grammar (i.e. group of
terminal strings)
➢ Example:
➢ Parsing:
Given a string of terminals
Figure out how to derive it from the start symbol of the grammar
Parse Tree
Parse Tree
➢ E → E + E | E * E | ( E ) | id (4.3)
9-5-2
▪ Will ‘5’ go with the ‘-’ on the left or the one on the right?
▪ If it goes with the one on the left: (9-5)-2 we say that the operator ‘-’ is
left-associative
▪ If it goes with the one on the right: 9-(5-2) we say that the operator ‘-’ is
right-associative
Context-Free Grammars
Associativity of Operators
We say ‘*’ has higher precedence than ‘-’ if it takes its operands before
‘-’
➢ NFA to CFG
Context-Free Grammars
(a|b)*abb
Context-Free Grammars
Question Worth Asking
2. Eliminating left-recursion
▪ Top-down parser cannot handle left-recursive grammars. During
parsing, it is possible for a top down parser to loop forever, So a
transformation is needed to eliminate left recursion.
3. Left factoring
▪ Elimination of common prefixes.
Context-Free Grammars
Eliminating Ambiguity
Note: When there is multiple IF and a single ELSE then the ELSE part doesn't get a clear view to go with which IF, this
problem is called dangling else problem.
Ambiguous Grammar:
(4.14)
Unambiguous Grammar:
Fig. 4.10
Context-Free Grammars
Eliminating Ambiguity
(4.15)
Fig. 4.10
(4.14)
The idea is that a statement appearing between a then and an else must be "matched" ;
that is, the interior statement must not end with an unmatched or open then.
A matched statement is either an if-then-else statement containing no open statements
or it is any other kind of unconditional statement.
Thus, we may use the grammar in Fig. 4.10. This grammar generates the same strings as
the dangling-else grammar (4.14), but it allows only one parsing for string (4.15); namely,
the one that associates each else with the closest previous unmatched then.
Context-Free Grammars
Eliminating Ambiguity
A grammar containing the productions.
A → AA | α
is ambiguous because the substring AAA has different parse tree.
Context-Free Grammars
Eliminating Ambiguity
A grammar containing the productions.
A → AA | α
is ambiguous because the substring AAA has different parse tree.
▪ A → AB | B
B → α
or
▪ A → BA | B
B → α
Context-Free Grammars
Eliminating Left-Recursion
➢ Example:
Context-Free Grammars
Eliminating Left-Recursion
Context-Free Grammars
Eliminating Left-Recursion
Context-Free Grammars
Eliminating Left-Recursion
➢ Example 1:
➢ Example 1: E → T E'
E' → + T E'| ϵ
T → F T’
T' → * F T’| ϵ
F → (E)| id
B → CD B → CD B → CD B → CD
When the choice between two alternative A-productions is not clear, we may be
able to rewrite the productions to defer the decision until enough of the input has
been seen that we can make the right choice.
In general, A → αβ1| αβ2 are two A-productions, and the input begins with a
nonempty string derived from α.
We do not know whether to expand A to αβ1 or αβ2. However, we may defer the
decision by expanding A to αA'. Then, after seeing the input derived from α, we
expand A' to β1 or β2
Context-Free Grammars
Left Factoring (Elimination of common prefixes)
How?
In left factoring,
▪ We make one production for each common prefix.
The grammar obtained after the process of left factoring is called as Left
Factored Grammar.
Context-Free Grammars
Left Factoring (Elimination of common prefixes)
Important Note:
▪ During left most derivation, when the choice between two alternative
A-productions is not clear, the right choice of A-production selection needs k
symbols lookahead on the input.
Example:
Context-Free Grammars
Left Factoring
Example:
Context-Free Grammars
Do left factoring in the following grammars-
4. S → a | ab | abc | abcd
Step-01:
A → aA'
A' → AB | Bc | Ac
Step-02:
A → aA'
A' → AD | Bc
D → B | c
Step-01:
S → bSS' / a
S' → SaaS / SaSb / b
Step-02:
S → bSS' / a
S' → SaA / b
A → aS / Sb
Step-01:
S → aS' / b
S' → SSbS / SaSb / bb
Step-02:
S → aS' / b
S' → SA / bb
A → SbS / aSb
4. S → a / ab / abc / abcd
Context-Free Grammars
Do left factoring in the following grammars-
4. S → a / ab / abc / abcd
Step-01:
S → aS'
S' → b / bc / bcd / ∈
Again, this is a grammar with common prefixes.
Step-02: Step-03:
S → aS' S → aS'
S' → bA / ∈ S' → bA / ∈
A → c / cd / ∈ A → cB / ∈
Again, this is a grammar with B → d / ∈
common prefixes. This is a left factored grammar.
Context-Free Grammars
Do left factoring in the following grammars-
S → aS'
S' → Ad / B
A → aA'
A' → b / ∈
B → ccd / ddc
Parsing
Top-Down Parsing
Top-Down Parsing
Top-down parsing can be viewed as
▪ The problem of constructing a parse tree for the input string, starting from the
root and creating the nodes of the parse tree in preorder (depth-first).
Grammar:
Top-Down Parsing
Show the sequence of parse trees for the input id+id*id
E⇒ TE'
⇒ FT'E'
⇒ idT'E'
⇒ idE'
⇒ id+TE'
⇒ id+FT'E'
⇒ id+idT'E'
⇒ id+id*FT'E'
⇒ id+id*idT'E'
⇒ id+id*idE'
⇒ id+id*id
Top-Down Parsing
Grammars should be free from
left recursion and ambiguities.
Table driven
This parsing technique recursively parses the input to make a parse tree,
which may or may not require back-tracking. But the grammar associated
with it (if not left factored) cannot avoid back-tracking.
RDP can be used to parse different types of code such as XML or other
inputs.
Top-Down Parsing
Recursive-Descent Parsing
2. return a pointer to the root of the parse tree for the non-terminal.
RDP execution begins with the procedure for the start symbol,
which halts and announces success if its procedure body scans
the entire input string.
proc A {
Match the current token with a, and move to the next token;
Call ‘B’;
Match the current token with b, and move to the next token;
}
Top-Down Parsing
Recursive-Descent Parsing
procA { Non backtracking RDP procedure for left factored grammar with one symbol lookahead.
‘a’: Match the current token with a, and move to the next token;
Call ‘B’;
Match the current token with b, and move to the next token;
‘b’: Match the current token with b, and move to the next token;
Call ‘A’;
Call ‘B’;
}
Top-Down Parsing
Recursive-Descent Parsing
proc C {
Match the current token with f, and move to the next token;
}
proc B
{
case (current token/current input symbol){
b: Match the current token with b, and move to the next token;
Call B
e,d: do nothing
}
Top-Down Parsing
Example of Backtracking
Based on the information the parser currently has about the input,
▪ If this choice leads to a dead end, the parser would have to backtrack
to that decision point, moving backwards through the input, and
start again making a different choice and so on until it either found the
production that was the appropriate one or ran out of choices.
▪ Grammar:
• aa
• aaaa
• aaaaaa
• aaaaaaaa
Top-Down Parsing
Demonstrate RDP with Backtracking for the given input
and grammar.
Input: w=read
Grammar:
X → oa | ea
Z → ai
Top-Down Parsing
Example of RDP with Non-Backtracking
Input: cad
Grammar(Left factored):
Expansion Input Action
S → cAd so far
Example:
FIRST
S → ABCDE S a b c d e ϵ
A → a | ϵ A a ϵ
B → b | ϵ B b ϵ
C → c | ϵ C c ϵ
D → d | ϵ D d ϵ
E → e | ϵ E e ϵ
Top-Down Parsing
FIRST and FOLLOW
❑ Example:
FIRST
E
E'
T
T'
F
Top-Down Parsing
FIRST and FOLLOW
❑ Example:
FIRST
E ( id
E' + ϵ
T ( id
T' * ϵ
F ( id
Top-Down Parsing
FIRST and FOLLOW
Top-Down Parsing
FIRST and FOLLOW
FIRST FOLLOW
E ( id $ )
E' + ϵ $ )
T ( id +$)
T' * ϵ +$)
F ( id *+$)
Top-Down Parsing
FIRST and FOLLOW
❑ Example:
❑ Example:
FIRST FOLLOW
E ( id
E' + ϵ
T ( id
T' * ϵ
F ( id
❑ Example:
FIRST FOLLOW
E ( id )$
E' + ϵ )$
T ( id +)$
T' * ϵ +)$
F ( id *+)$
❑ Exercise -1:
❑ Exercise -1:
FIRST FOLLOW
S
U
V
W
❑ Exercise -1:
FIRST FOLLOW
S u y z wx
U u y z ϵ
V w x ϵ
W y z
❑ Exercise -1:
FIRST FOLLOW
S u y z wx $
U u y z ϵ w xy z
V w x ϵ y z
W y z v $
❑ Exercise - 2:
FIRST FOLLOW
S
A
B
C
D
E
❑ Exercise - 2:
FIRST FOLLOW
S a b c
A a ϵ
B b ϵ
C c
D d ϵ
E e ϵ
❑ Exercise - 2:
FIRST FOLLOW
S a b c $
A a ϵ b c
B b ϵ c
C c d e $
D d ϵ e $
E e ϵ $
❑ Exercise - 3:
FIRST FOLLOW
S
B
C
❑ Exercise - 3:
FIRST FOLLOW
S a c b d
B a ϵ
C c ϵ
❑ Exercise - 3:
FIRST FOLLOW
S a c b d $
B a ϵ b
C c ϵ d
FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.
❑ Exercise - 4:
FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.
FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.
FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.
FIRST and FOLLOW two or more r.h.s. by predicting the first symbol that each r.h.s. can
derive.
❑ Exercise - 5:
❑ Exercise - 5:
❑ Even if there is only one r.h.s. we can still use them to tell us
whether or not we have an error - if the current input symbol
cannot be derived from the only r.h.s. available, then we know
immediately that the sentence does not belong to the grammar,
without having to (attempt to) finish the parse.
Top-Down Parsing
Why FIRST in Compiler Design?
If the compiler would have come to know in advance, that what is the “first
character of the string produced when a production rule is applied”, and
comparing it to the current character or token in the input string it sees, it can wisely
take decision on which production rule to apply.
Thus, in the example above, if it knew that after reading character ‘c’ in the input
string and applying S->cAd, next character in the input string is ‘a’, then it would
have ignored the production rule A->bc (because ‘b’ is the first character of the
string produced by this production rule, not ‘a’ ), and directly use the production
rule A->a
Top-Down Parsing
In LL(1):
▪ the first "L“ stands for scanning the input from left to right,
▪ the "1" for using one input symbol of lookahead at each step to make parsing
action decisions.
S → Aa S → Aa S → A|xb
B →x
Top-Down Parsing
LL(1) Grammars
S → Aa S → Aa S → A|xb
B →x
o S → Xb | Yc
o X → a
o Y → a
❑ By seeing only the first input symbol a, you cannot know whether to apply the
production S → Xb or S → Yc, because a is in the FIRST set of both X and Y.
❑ By seeing only the first input symbol f, you cannot decide whether to
apply the production A → fe or A → ϵ, because f is in both the FIRST set
of A and the FOLLOW set of A (A can be parsed as epsilon and B as f).
X → YaYb|ZbZa
Y →ϵ
Z →ϵ
S → ABA
A →aA|ϵ
B →b|ϵ
Top-Down Parsing
LL(1) Grammars
X → YaYb|ZbZa
First(YaYb) = {a} FIRST(YaYb) and
Y →ϵ First(ZbZa) = {b} FIRST(ZbZa) are disjoint,
Predictive parsers can be constructed for LL(1) grammars since the proper
production to apply for a nonterminal can be selected by looking only at the
current input symbol.
then the keywords if, while, and the symbol { tell us which alternative is the only
one that could possibly succeed if we are to find a statement.
Top-Down Parsing
LL(1) Grammars
▪ From the FIRST and FOLLOW sets for a grammar, we shall construct
Predictive parsing tables.
FIRST and FOLLOW sets are also useful during bottom-up parsing.
Top-Down Parsing
LL(1) Grammars
Example:
S → x | xy | xyz
Example:
S → x | xy | xyz
Note: This algorithm collects the information from FIRST and FOLLOW sets into a predictive
parsing table M[A,a], a two-dimensional array, where A is a nonterminal, and a is a terminal
or the symbol $, the input end marker.
Top-Down Parsing
FIRST FOLLOW
Example 1: E ( id )$
E' + ϵ )$
T ( id +)$
T' * ϵ +)$
F ( id *+)$
FIRST FOLLOW
S ia e$
S' eϵ e$
E b t
The parsing table in Fig. 4.18. The entry for M[S',e] contains both
S' → eS and S' → ϵ. The grammar is ambiguous.
a b c d e $
S
A
B
C
D
E
Top-Down Parsing
Example 3: Construct parsing table for the following grammar
FIRST FOLLOW
S → ABCDE S a b cdeϵ $
A → a | ϵ A a ϵ bcde$
B → b | ϵ
B b ϵ cde$
C → c | ϵ
D → d | ϵ C c ϵ de$
E → e | ϵ D d ϵ e$
E e ϵ $
a b c d e $
S S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE
A A → a A → ϵ A → ϵ A → ϵ A → ϵ A → ϵ
B B → b B → ϵ B → ϵ B → ϵ B → ϵ
C C → c C → ϵ C → ϵ C → ϵ
D D → d D → ϵ D → ϵ
E E → e E → ϵ
Top-Down Parsing
Exercise
S → +SS | * SS | a
Write predictive parsing table
FIRST FOLLOW + * a $
S S
Top-Down Parsing
Exercise
S → +SS | * SS | a
Write predictive parsing table
FIRST FOLLOW + * a $
S +*a +*a$ S S → +SS S → *SS S → a
Top-Down Parsing
Exercise
S → Aa
A → bA|B
B → Cc
C → bC|ϵ
If w is the input that has been matched so far, then the stack holds α
sequence of grammar symbols a such that
M:
S → +SS | * SS | a
Write predictive parsing table
S → a | ^ | (L)
L → L,S | S
Recursive grammar
S → a | ^ | (L) LL(1) table/Predictive parsing table
FIRST FOLLOW
L → L,S | S a ^ ( ) , $
S a ^ ( $ , )
S S → a S → ^ S → (L)
Non-recursive grammar L a ^ ( )
S → a | ^ | (L) L L → SA L → SA L → SA
A , ϵ )
L → SA
A → ,SA | ϵ A A → ϵ A → ,SA
S → AaAb | BbBa
A → λ
B → λ
Top-Down Parsing
Problem 4:
S → AB
A → a | λ
B → b | λ
f) If the grammar is in LL(1), parse the strings: abce, cde and empty string
S → ABCDE
A → a | λ
B → b | λ
C → c | λ
D → d | λ
E → e | λ
a b c d e $
S S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE S→ABCDE
S → ABCDE FIRST FOLLOW
A A → a A → ϵ A → ϵ A → ϵ A → ϵ A → ϵ
A → a | ϵ S a b cdeϵ $
B → b | ϵ A a ϵ bcde$ B B → b B → ϵ B → ϵ B → ϵ B → ϵ
C → c | ϵ B b ϵ cde$ C C → c C → ϵ C → ϵ C → ϵ
D → d | ϵ C c ϵ de$
D D → d D → ϵ D → ϵ
E → e | ϵ D d ϵ e$
E e ϵ $ E E → e E → ϵ
1. the terminal on top of the stack does not match the next input symbol or
.....
Top-Down Parsing
Error Recovery in Predictive Parsing
▪ Modify the stack and/or the input string to try and reach state
from which we can continue.
2. Pharse-level recovery
Idea:
➢ Decide on a set of synchronizing tokens.
➢ When an error is found and there's a nonterminal at the top of the stack,
➢ When an error is found and there is a terminal at the top of the stack,
➢ We might add keywords that begins statements to the synchronizing sets for the
nonterminals generating expressions.
➢ We can add to the synchronizing set of a lower-level construct the symbols that
begin higher-level constructs.
Error Recovery in Predictive Parsing
▪ the entry is "synch," then the nonterminal on top of the stack is popped in an
attempt to resume parsing.
▪ If a token on top of the stack does not match the input symbol, then we pop
the token from the stack, as mentioned above.
On the erroneous input + id * +id, the parser and error recovery mechanism
of Fig. 4.22 behave as in Fig. 4.23.
Error routines typically remove tokens from the input, and/or pop an
item from the stack.
Bottom-Up Parsing
Bottom-Up Parsing
A bottom-up parse corresponds to the construction of a parse tree for an input string
beginning at the leaves (the bottom) and working up towards the root (the top).
Bottom-Up Parsing
A general style of bottom-up parsing is known as shift-reduce parsing.
Right-most derivation:
Right-most derivation:
Reductions
The key decisions during bottom-up parsing are about when to reduce
and about what production to apply, as the parse proceeds.
Bottom-Up Parsing
Reductions
Example: The reductions will be discussed in terms of the sequence of strings
➢ Now, we have a choice between reducing the string T, which is the body of E →
T, and the string consisting of the second id, which is the body of F → id. Rather
than reduce T to E, the second id is reduced to T, resulting in the string T * F. This
string then reduces to T. The parse completes with the reduction of T to the start
symbol E.
Bottom-Up Parsing
Reductions
Right-most derivation:
---------------------------------------→
The rightmost derivation in reverse
Bottom-Up Parsing
Handle Pruning
Shift-Reduce Parsing
Bottom-Up Parsing: Shift-Reduce Parsing
➢ In Shift-reduce parsing
▪ A Stack holds grammar symbols.
▪ The handle always appears at the top of the stack just before it is
identified as the handle.
▪ $ - mark the bottom of the stack and also the right end of the input.
Parse is successful if stack contains only the start symbol when the
input stream ends.
Bottom-Up Parsing: Shift-Reduce Parsing
1. Shift. Shift the next input symbol onto the top of the stack.
▪ Technically, these CFGs are not in the LR(k) class of grammars; we refer
to them as non-LR grammars.
Bottom-Up Parsing: Shift-Reduce Parsing
Note:
LR Parsing
LR Parsing
▪ the k for the number of input symbols of lookahead that are used
in making parsing decisions.
For example, with stack contents $T and next input symbol * in Fig. 4.28,
how does the parser know that T on the top of the stack is not a handle, so
the appropriate action is to shift and not to reduce T to E?
LR Parsing
Items and the LR(0) Automaton
How does a shift-reduce LR-parser know when to shift and when to reduce?
▪ An LR parser makes shift-reduce decisions by using automaton
(maintaining states) to keep track of where we are in a parse.
▪ A → •XYZ
▪ A → X•YZ
▪ A → XY•Z
▪ A → XYZ•
For example
➢ Item A → •XYZ indicates that we hope to see next an input string derivable from
XYZ.
➢ Item A → X•YZ indicates that we have just seen on the input a string derivable
from X and that we hope next to see a string derivable from YZ.
➢ Item A → XYZ• indicates that we have seen a string derivable from XYZ on the
input and that it may be time to reduce XYZ to A.
One collection of sets of LR(0) items, called the canonical LR(0) collection,
provides the basis for constructing a DFA that is used to make parsing
decisions. Such an automaton is called an LR(0) automaton.
LR Parsing
▪ Example:
G’ :
G:
LR Parsing
Closure of Item Sets
Examples:
LR Parsing
Closure of Item Sets
Closure algorithm
LR Parsing
Closure of Item Sets
Example 4.40 : Construct the Closure of Item Sets for the following
augmented grammar
LR Parsing
Closure of Item Sets
LR Parsing
The Function GOTO
▪ The second useful function is GOTO(I,X) where I is a set of items and X is a
grammar symbol.
▪ The GOTO function is used to define the transitions in the LR(0) automaton for
a grammar.
▪ GOTO(I,X) is defined to be the closure of the set of all items [A → αX•β] such
that [A → α•Xβ] is in I.
Example: GOTO(I1,+)
1
1
LR Parsing
Algorithm to construct C, the canonical collection of sets of LR(0)
items for an augmented grammar G'
LR Parsing
E' → E
▪Nonkernel items
▪no need to be stored
▪dot at far left
LR Parsing
Exercise 2: Construct the LR(0) automaton for the following grammar.
FIRST FOLLOW
S
LR Parsing
Exercise 3: Construct the LR(0) automaton for the following grammar.
LR Parsing
The LR-Parsing
LR parser consists of
1. an input,
2. an output,
3. a stack,
The parsing program reads characters from an input buffer one at a time.
Where a shift-reduce parser would shift a symbol, an LR parser shifts a state.
Each state summarizes the information contained in the stack below it.
We shall refer
▪ to the parsing table constructed by SLR method as an SLR table,
and
The SLR method begins with LR(0) items and LR(0) automata.
That is, given a grammar, G, we augment G to produce G', with a new start symbol
S'. From G', we construct C, the canonical collection of sets of items for G' together
with the GOTO function.
SLR Parser
E' → E
SLR Parser
Constructing SLR-Parsing Tables
SLR Parser
Constructing SLR-Parsing Tables
1. ...
2. ...
SLR Parser
Constructing SLR-Parsing Tables
An LR parser using the SLR(1) table for G is called the SLR(1) parser for G,
and a grammar having an SLR(1) parsing table is said to be SLR(1).
We usually omit the "(1)" after the "SLR," since we shall not deal here
with parsers having more than one symbol of lookahead.
In SLR parsers, the lookahead sets are determined directly from the
grammar, without considering the individual states and transitions.
SLR Parser
2.
Constructing SLR-Parsing Table:
E’ -> E
FOLLOW
E +)$ 3.
T +*)$
F +*)$
SLR Parser
SLR Parser
2.
Constructing SLR-Parsing Table:
E’ -> E
FOLLOW
E +)$ 3.
T +*)$
F +*)$
SLR Parser
s5 state number r3 Production number
Constructing SLR-Parsing Table
shift reduce
Example 4.47 : Let us construct the SLR table for the augmented expression grammar.
FOLLOW
E +)$
T +*)$
F +*)$
S -> SA
S -> A
A -> a
while(1) {
let s be the state on top of the stack;
if (ACTION[s, a] = shift t) {
push a and then t onto the stack;
let a be the next input symbol;
} else if (ACTION[s, a] = reduce A → β) {
pop 2*|β| symbols off the stack;
let state t now be on top of the stack;
push A and then GOTO[t, A] onto the stack;
} else if (ACTION[s, a] = accept) break; /* parsing is done */
else call error-recovery routine;
SLR Parsing (Method-2) Input: id+id$
Stack Input Action Fig: SLR Parsing table
buffer
$0 id+id$
$0E1+6F3 $ ACTION[3,$] = r4 (T → F), Pop 2*|T| symbols pop 2*|β| symbols off the stack;
from the stack, Push T and then GOTO[6,T]=9 onto let state t now be on top of the stack;
the stack. push A and then GOTO[t, A] onto the
stack;
$0E1+6T9 $ ACTION[9,$] = r1 (E → E+T), Pop 2*|E+T| symbols
} else if (ACTION[s, a] = accept) break;
from the stack, Push E and then GOTO[0,E]=1 /* parsing is done */
onto the stack.
else call error-recovery routine;
$0E1 $ ACTION[1,$] = Accept }
SLR Parser
Exercise 4.6.2 : Construct the SLR sets of items for the (augmented)
grammar.
Show the parsing table for this grammar. Is the grammar SLR?
Show the actions of your parsing table from Exercise 4.6.2 on the input
aa*a+.
SLR Parser
Exercise 4.6.2 : Construct the SLR sets of items for the (augmented)
grammar. S → SS+ | SS* | a First Follow
Accept S a a+*$
I0 I1 $ I3 I4
S S + (1) S -> SS+
(2) S -> SS*
(3) S -> a
I5
*
S
Fig: SLR Parsing table
a ACTION GOTO
I2 a a + * $ S
a
0
1
Fig: LR(0) Automata 2
3
Show the parsing table for this grammar. Is the grammar SLR? 4
Yes, the grammar is SLR, because no conflicts in SLR table, 5
SLR Parser
Exercise 4.6.2 : Construct the SLR sets of items for the (augmented)
grammar. S → SS+ | SS* | a First Follow
Accept S a a+*$
I0 I1 $ I3 I4
S S + (1) S -> SS+
(2) S -> SS*
(3) S -> a
I5
*
S
Fig: SLR Parsing table
a ACTION GOTO
I2 a a + * $ S
a
0 s2 1
1 s2 Acc 3
Fig: LR(0) Automata 2 r3 r3 r3 r3
3 s2 s4 s5 3
Show the parsing table for this grammar. Is the grammar SLR? 4 r1 r1 r1 r1
Yes, the grammar is SLR, because no conflicts in SLR table, 5 r2 r2 r2 r2
SLR Parsing (Method-2)
ACTION GOTO
Exercise 4.6.2 : S -> SS+ | SS* | a
a + * $ S
Show the actions of your parsing table on the input aa+.
0 s2 1
$0a2 a+$ ACTION[2,a] = r3 (S → a), Pop 2*|a| symbols from the stack,
Push S and then GOTO[0,S]=1 onto the stack.
$0S1 a+$ ACTION[1,a] = s2, Push a and then 2 onto the stack
LR(0) Parsing
LR(0) Parsing
An LR parser using an LR(0)-parsing table is an LR(0) parser.
The LR(0) method begins with LR(0) items and LR(0) automata. That is, given
a grammar, G, we augment G to produce G', with a new start symbol S'.
From G', we construct C, the canonical collection of sets of items for G'
together with the GOTO function.
LR(0) parsing does not care about next input symbol. It does not use
lookahead.
If we could peek at the next token and use that as part of the decision
making, we will find that it allows for a much larger class of grammars to
be parsed.
LR(0) Parsing
Is the following grammar in LR(0)?
S→A
S→ a
A→a
LR(0) Parsing
Is the following grammar in LR(0)?
E→T+E
E→T
T → id
LR(0) Parsing
Is the following grammar in LR(0)?
1) S → A a A b
2) S → B b B a First Follow
S ab $
3) A → ε A ε ab
4) B → ε B ε ab
I0
ACTION GOTO
S I1 a b $ S A B
0 r3 r4 r3 r4 r3 r4 1 2 3
A I2
…
B I3 …
The simple improvement that SLR(1) makes on the basic LR(0) parser
is to reduce only if the next input token is a member of the follow set
of the non-terminal being reduced.
❑ Same way to fill the Goto part for the shift move in the parsing table. SLR
❑ In SLR, the addition of just one token of lookahead and use of the follow set
greatly expands the class of grammars that can be parsed without conflict.
Viable Prefixes
LR Parsing
Viable Prefixes
➢ Not all prefixes of right-sentential forms can appear on the stack, however,
since the parser must not shift past the handle. For example, suppose
LR Parsing
Viable Prefixes
➢ Then, at various times during the parse, the stack will hold (, (E, and (E), but it
must not hold (E)*, since (E) is a handle, which the parser must reduce to F
before shifting *.
➢ The prefixes of right sentential forms that can appear on the stack of a shift-
reduce parser are called viable prefixes.
➢ The prefixes of right sentential form that can appear on the stack of a shift-
reduce parser are called viable prefixes.
❑ The entire SLR parsing algorithm is based on the idea that the LR(0)
automaton can recognize viable prefixes and reduce them appropriately.
We can easily compute the set of valid items for each viable prefix that
can appear on the stack of an LR parser.
In fact, it is a central theorem of LR-parsing theory that the set of valid items
for a viable prefix γ is exactly the set of items reached from the initial state
along the path labeled γ in the LR(0) automaton for the grammar.
In essence, the set of valid items embodies all the useful information that
can be gleaned from the stack.
LR Parsing
Viable Prefixes
LR Parsing
Viable Prefixes
Exercise 4.6.1 : Describe all the viable prefixes for the following
grammars:
More Powerful LR Parsers
Next, we will extend the previous LR parsing techniques to use one
symbol of lookahead on the input.
▪ This method uses a large set of items, called the LR(1) items.
LR Parsing
It is possible to carry more information in the state that will allow us to rule
out some of these invalid reductions by A → α. E.g. [A→α•, a],
By splitting states when necessary, we can arrange to have each state of
an LR parser indicate exactly which input symbols can follow a handle α for
which there is a possible reduction to A.
CLR Parser
Canonical LR(1) Items
The general form of an item becomes [A→α•β, a], where A→αβ is a production
and a is a terminal or the right endmarker $. We call such an object an LR(1) item.
The lookahead has no effect in an item of the form [A→α•β, a], where , β is
not ϵ, but an item of the form [A→α•, a], calls for a reduction by A→α only if
the next input symbol is a.
CLR Parser
Constructing LR(1) Sets of Items
First Follow
S c d $
C c d c d $
Exercise 4.7.1 : Construct the canonical LR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
(1) S -> SS+
S S + (2) S -> SS*
(3) S -> a
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
ACTION GOTO
S
a + * $ S
0
1
Fig: LR(1) Automata
2
3
4
5
6
7
8 Fig: CLR Parsing table
9
CLR Parser S
First
a
Follow
a+*$
Exercise 4.7.1 : Construct the canonical LR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
(1) S -> SS+
S S + (2) S -> SS*
(3) S -> a
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
ACTION GOTO
S
a + * $ S
0 s2 1
1 s4 Acc 3
Fig: LR(1) Automata
2 r3 r3
3 s4 s5 s6 7
4 r3 r3 r3
5 r1 r1
6 r2 r2
7 s4 s8 S9 7
8 r1 r1 r1 Fig: CLR Parsing table
9 r2 r2 r2
CLR Parser
S * id $
S -> L=R | R R * id = $
L -> *R | id L * id = $
R -> L
FIRST FOLLOW
CLR Parser S
R
L
* id
* id
* id
$
= $
= $
id I5 id I12 id
Fig: CLR/LR Parsing table
ACTION GOTO = * id $ S L R
= * id $ S L R 7 r3 r3
0 s4 s5 1 2 3 8 r5 r5
1 Acc 9 r1
2 s6 r5 10 r5
3 r2
11 s11 s12 10 13
4 s4 s5 8 7
12 r4
5 r4 r4
13 r3
6 s11 s12 10 9
FIRST FOLLOW
CLR Parser S
R
L
* id
* id
* id
$
= $
= $
S -> L=R | R
L -> *R | id LR(1), not SLR(1)
R -> L
LR Parsing
LALR Parsing
LALR Parser
❑ The LALR method has many fewer states than typical parsers based
on the LR(1) items.
▪ handle many more grammars with the LALR method than with the
SLR method, and
▪ build parsing tables that are no bigger than the SLR tables.
The same is almost true for SLR grammars, but there are a few
constructs that cannot be conveniently handled by SLR techniques
(see Example 4.48, for example).
LALR Parser
Constructing LALR Parsing Tables
For a comparison of parser size, the SLR and LALR tables for a
grammar always have the same number of states.
▪ The CLR table would typically have several thousand states for the same-
size language.
▪ Thus, it is much easier and more economical to construct SLR and LALR
tables than the canonical LR tables.
LALR Parser
Constructing LALR Parsing Tables
LALR Parser
Constructing LALR Parsing Tables
Example 4.60 : Again, consider grammar (4.55). there are three pairs of sets
of items that can be merged.
Fig: LALR/LALR(1) Automata
Fig: LR(1) Automata I0 I1
S $ Accept
I2 I5
C
c d
d I36 c
I89
C
c
LALR/LALR(1) Parsing table
ACTION GOTO
I47 d
c d $ S C
36
89
LALR Parser
s5 state number r3 Production number
Constructing LALR Parsing Tables
shift reduce
Example 4.60 : Again consider grammar (4.55). there are three pairs of sets
of items that can be merged. I3 and I6 are replaced by their union:
1. S → CC
2. C → cC
3. C → d
⇒
LALR Parser
s5 state number r3 Production number
Constructing LALR Parsing Tables
shift reduce
Example 4.60 : Again consider grammar (4.55). there are three pairs of sets
of items that can be merged. I3 and I6 are replaced by their union:
1. S → CC
2. C → cC
3. C → d
⇒
LALR Parser
Constructing LALR Parsing Tables
The LALR action and goto functions for the condensed sets of items
are shown in Fig. 4.43.
LALR Parser
Exercise 4.7.1:
Exercise 4.7.1 : Construct the LALR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
S S + Fig: LR(1) Automata
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
S
Accept
I0 I1 $ I37 I58 Fig: LALR Parsing table
S S + ACTION GOTO
I69 a + * $ S
0 s24 1
*
1 s24 Acc 37
S 24 r3 r3 r3 r3
Exercise 4.7.1 : Construct the LALR sets of items for S → SS+ | SS* | a
Accept
I0 I1 $ I3 I5
S S + Fig: LR(1) Automata
I6
*
I8
a I7
S +
a a I9
I2 I4
a *
Fig: CLR Parsing table
ACTION GOTO
S
a + * $ S
0 s2 1 Fig: LALR Parsing table
4 r3 r3 r3 1 s24 Acc 37
5 r1 r1 24 r3 r3 r3 r3
7 s4 s8 S9 7 58 r1 r1 r1 r1
8 r1 r1 r1 69 r2 r2 r2 r2
9 r2 r2 r2
LALR Parser
Specify whether the following grammar is in LALR or not
S -> A a | b A c | d c
A -> d
More Powerful LR Parsers
Exercise 4.7.2:
Construct the
a) canonical LR, and
b) LALR
I3 I7 I11 1 Acc
b A c
2 s6
I8
B a 3 s9 7 8
I9 I12 4 s10
d
5 r5 r6
I4 I10
B c 6 r1
I5 7 s11
d
8 s12
Fig: LALR Parsing table
a b c d $ S A B 9 r6 r5
Fig: LR(1) Automaton
10 r3
0 s3 s59 1 2 4
11 r2
… Acc
12 r4
r-r conflicts, not LALR(1) 59 r5/r6 r5/r6
…
LR Parsers
SLR(1) or LALR(1) or CLR(1) or CLR
LR(0) SLR LALR or LR(1) or LR
(Simple LR) (LookAhead LR) (Canonical LR)
LR(1) automata
LR(0) automata States containing LR(1) items.
DFA
States containing LR(0) items LR(1) item = LR(0) item +
lookahead
5) S -> A a | b A c | d c
A -> d
Answer: LALR Y, LL X
Exercise 01:
Consider the following grammar:
S→AA
A→aA|b
Make its CLR, LALR parsing tables and parse the string “abb”.
LR Parsers
Exercise 02:
Consider the following grammar:
S→dA|aB
A→bA|c
B→bB|c
Make its CLR, LALR parsing tables and parse the string “abc”.
LR Parsers
Exercise 03:
Consider the following grammar:
E→E+T|T
T→T*F|F
F → id
Exercise 04:
Consider the following grammar:
S→(X|E]|F)
X→E)|F]
E→A
F→A
Exercise 05:
S→Ab
A→cbAd
A→cAd
A→λ
LR Parsers
Exercise 06:
S→Abbx|Bbby
A→x
B→x
Specify the value of i and j such that the grammar in LL(i) and LR(j).
LR Parsers
Why is LR(1) so powerful?
Note that this diagram refers to grammars, not languages, e.g. there may be an
equivalent LR(1) grammar that accepts the same language as another non-LR(1)
grammar. No ambiguous grammar is LL(1) or LR(1), so we must either rewrite the
grammar to remove the ambiguity or resolve conflicts in the parser table or
implementation.
The hierarchy of LR variants is clear: every LR(0) grammar is SLR(1) and every SLR(1) is
LALR(1) which in turn is LR(1). But there are grammars that don’t meet the
requirements for the weaker forms that can be parsed by the more powerful
variations.
LR Parsers
LL (1) v/s LALR (1)
Error repair:
▪ Both LL(1) and LALR(1) parsers possess the valid/viable prefix property.
What is on the stack will always be a valid prefix of a sentential form.
Errors in both types of parsers can be detected at the earliest possible
point without pushing the next input symbol onto the stack.
▪ LL(1) parse stacks contain symbols that are predicted but not yet
matched. This information can be valuable in determining proper
repairs.
▪ LALR(1) parse stacks contain information about what has already been
seen, but do not have the same information about the right context that
is expected.
▪ This means deciding possible continuations is somewhat easier in an
LL(1) parser.
LR Parsers
LL (1) v/s LALR (1)
Efficiency:
▪ Both require a stack of some sort to manage the input. That stack can
grow to a maximum depth of n, where n is the number of symbols in the
input.
▪ If you are using the runtime stack (i.e. function calls) rather than pushing
and popping on a data stack, you will probably pay some significant
overhead for that convenience (i.e. a recursive descent parser takes
that hit).
▪ If both parsers are using the same sort of stack, LL(1) and LALR(1) each
examine every non-terminal and terminal when building the parse tree,
and so parsing speeds tend to be comparable between the two.
LR Parsers - Exercises
Is the given grammar in LL(1), SLR, LALR(1) , LR or not??
P → M *| ε
M → Q StarM| ε
StarM → (* M *)| ( Q * )
Q → o | ε
S → [ X| E )| F [
X → E )| F ]
E → A
F → A
A → ε
LR Parsers - Exercises
Is the given grammar in LL(1), SLR(1), LALR(1) , CLR(1) or
not??
L → V ( args )
| L equals Var ( )
V → Var + V
| id
Var → id
❑ The weakness of the SLR(1) and LR(0) parsers mean they are only
capable of handling a small set of grammars.
❑ The popular tools yacc and bison generate LALR(1) parsers and most
programming language constructs can be described with an LALR(1)
grammar.
Parser Generators
LAB Instructions
Parser Generators
The Parser Generator - Yacc
Parser drives the lexical analysis - it must know the function which performs lexical analysis, Hence,
we must declare yylex() function in definitions part of the yacc file
Similarly, since we expect the lex file to generate tokens, it must know their definitions, Hence, we
must include the y.tab.h file in definitions part of the lex file
Parser Generators
The Parser Generator - Yacc
%{
/* C includes */
}%
/* Other Declarations */
%%
/* Rules */
%%
/* user subroutines */
Parser Generators
The Parser Generator - Yacc
➢ Names(terminals) representing tokens must be declared; this is done by
writing
• %token name1 name2 . . .
➢ Every nonterminal symbol must appear on the left side of at least one rule.
➢ If a symbol neither is a token nor appears on the left side of a rule, it’s like
an unreferenced variable in a C program. It doesn’t hurt anything, but it
probably means the programmer made a mistake.
Parser Generators
The Parser Generator - Yacc
– if not declared explicitly, defaults to the nonterminal on the LHS of the first
grammar rule listed.
Parser Generators
The Parser Generator - Yacc
This is done by a series of lines beginning with the yacc keywords %left,
%right, or %nonassoc, followed by a list of tokens. All of the tokens on the
same line are assumed to have the same precedence level and
associativity; the lines are listed in order of increasing precedence or
binding strength. Thus:
The token digit is a single digit between 0 and 9. A Yacc desk calculator
program derived from this grammar is shown in Fig. 4.58.
Parser Generators
The Parser Generator - Yacc
#include <ctype.h>
%token DIGIT
If Lex is used to create the lexical analyzer that passes token to the Yacc
parser, then these token declarations are also made available to the
analyzer generated by Lex.
Parser Generators
The Parser Generator - Yacc
In the part of the Yacc specification after the first %% pair, we put the
translation rules.
The rules section simply consists of a list of grammar rules. Since ASCII
keyboards don’t have a → key, we use a colon between the left- and
right-hand sides of a rule, and we put a semicolon at the end of each rule.
Parser Generators
The Parser Generator - Yacc
Parser Generators
Execution
❑ Linux/MacOS
$ lex lexer.l // generates lex.yy.c
$ yacc parser.y // generates y.tab.c, y.tab.h
$ gcc y.tab.c lex.yy.c -ll -ly // linking lex and yacc
$ ./a.out < input // run the executable
❑ Windows
$ bison -dy prog.y
$ flex hello.l
$ gcc y.tab.c lex.yy.c
$ a.exe < input
References
The central idea behind "Simple LR," or SLR, parsing is the construction of
LR(0) automaton from the grammar.
Suppose that the string γ of grammar symbols takes the LR(0) automaton
from the start state 0 to some state j.
Example 4.48 : Every SLR(1) grammar is unambiguous, but there are many
unambiguous grammars that are not SLR(1). Consider the grammar with
productions
▪ Many errors appear syntactic, whatever their cause, and are exposed
efficiently.