You are on page 1of 40

UNIT –II

Bottom-up Parsing

Bottom-up parsing corresponds to the construction of a parse tree for an input string
beginning at the leaves (the bottom nodes) and working up towards the root (the top node). It
involves “reducing an input string ‘w’ to the Start Symbol of the grammar. in each reduction
step, a perticular substring matching the right side of the production is replaced by symbol on the
left of that production and it is the Right most derivation. For example consider the following
Grammar:

E E+T|T
T T*F
F (E)|id

Bottom up parsing of the input string “id * id “is as follows:

Input String Sub String Reducing production


id*id Id F->id
F*id F->T
T*id Id F->id
T*F * T->T*F
T E->T
Start symbol. Hence, the input
E
String is accepted

Parse Tree representation is as follows:

Fig: A Bottom-up Parse tree for the input String “id*id”

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 1
Bottom up parsing is classified in to

1. Shift-Reduce Parsing
2. Operator precedence parsing
3. Table Driven L R Parsing
i. LR ( 1 )
ii. SLR( 1 )
iii. CLR ( 1 )
iv. LALR( 1 )

Shift-Reduce Parsing

Shift-reduce parsing is a form of bottom-up parsing in which a stack holds grammar


symbols and an input buffer holds the rest of the string to be parsed, We use $ to mark the
bottom of the stack and also the right end of the input. And it makes use of the process of shift
and reduce actions to accept the input string. Here, the parse tree is Constructed bottom up from
the leaf nodes towards the root node.

When we are parsing the given input string, if the match occurs the parser takes the
reduce action otherwise it will goes for shift action. And it can accept ambiguous grammars also.

Ex; consider the below grammar to accept the input string “id * id “using S-R parser

E E+T|T
T T*F
F (E)|id

Actions of the Shift-reduce parser using Stack implementation

STACK INPUT ACTION


$ Id*id$ Shift
$id *id$ Reduce with F d
$F *id$ Reduce with T F
$T *id$ Shift
$T* id$ Shift
$T*id $ Reduce with F id
$T*F $ Reduce with T T*F
$T $ Reduce with E T
$E $ Accept

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 2
Consider the following grammar:

S aAcBe
A Ab|b
B d

Let the input string is “abbcde”. The series of shift and reductions to the start symbol are
as follows.
abbcde aAbcde aAcde aAcBe S

Note: in the above example there are two actions possible in the second Step, these are as
follows:
1. Shift action going to 3rd Step
2. Reduce action, that is A->b
If the parser is taking the 1st action then it can successfully accepts the given input string,
if it is going for second action then it can’t accept given input string. This is called shift reduce
conflict. Where, S-R parser is not able take proper decision, so it not recommended for parsing.

Operator Precedence Parsing


Operator precedence grammar is kinds of shift reduce parsing method that can be applied
to a small class of operator grammars. And it can process ambiguous grammars also.
An operator grammar has two important characteristics:
1. There are no € productions.
2. No production would have two adjacent non terminals.
The operator grammar for to accept expressions is give below:
E E+E
E E-E
E E*E
E E/E
E E^E
E -E
E (E)
E id

Two main Challenges in the operator precedence parsing are:

1. Identification of Correct handles in the reduction step, such that the given input should be
reduced to starting symbol of the grammar.
2. Identification of which production to use for reducing in the reduction steps, such that we
should correctly reduce the given input to the starting symbol of the grammar.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 3
Operator precedence parser consists of:

1. An input buffer that contains string to be parsed followed by a$, a symbol used to
indicate the ending of input.
2. A stack containing a sequence of grammar symbols with a $ at the bottom of the stack.

3. An operator precedence relation table O, containing the precedence ralations between the
pair of terminal. There are three kinds of precedence relations will exist between the pair
of terminal pair ‘a’ and ‘b’ as follows:

1. The relation a<•b implies that he terminal ‘a’ has lower precedence than terminal
‘b’.

2. The relation a•>b implies that he terminal ‘a’ has higher precedence than terminal
‘b’.

3. The relation a=•b implies that he terminal ‘a’ has lower precedence than terminal
‘b’.

4. An operator precedence parsing program takes an input string and determines whether it
conforms to the grammar specifications. It uses an operator precedence parse table and
stack to arrive at the decision.

a1 a2 a3 ……….. $ Input Buffer

Operator precedence
algorithm Parsing Output
Algorithm
$

Stack

Operator Precedence Table

Fig: Components of operator precedence parser

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 4
Ex: if the grammar is

E E+E
E E-E
E E*E
E E/E
E E^E
E -E
E (E)
E id construct operator precedence table and accept input string “
id+id*id”

The precedence relations between the operators are


( id ) > ( ^ ) > ( * / ) > ( + - ) > $ , ‘^’ operator is Right Associative and reaming all
operators are Left Associative

+ - * / ^ id ( ) $
+ •> •> <• <• <• <• <• •> •>
- •> •> <• <• <• <• <• •> •>
* •> •> •> •> <• <• <• •> •>
/ •> •> •> •> <• <• <• •> •>
^ •> •> •> •> <• <• <• •> •>
Id •> •> •> •> •> Err Err •> •>
( <• <• <• <• <• <• <• = Err
) •> •> •> •> •> Err Err •> •>
$ <• <• <• <• <• <• <• Err Err

The intention of the precedence relations is to delimit the handle of the given input String
with <• marking the left end of the Handle and •> marking the right end of the handle.

Parsing Action

To locate the handle following steps are followed:


1. Add $ symbol at the both ends of the given input string.
2. Scan the input string from left to right until the right most •> is encountered.
3. Scan towards left over all the equal precedence’s until the first <• precedence is
encountered.
4. Every thing between <• and •> is a handle.
5. $ on S means parsing is success.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 5
Ex: if the input string is “id*id” and the grammar is

E E+E
E E*E Explain the parsing Actions?
E id

1. $ <• id •> *<• id•> $

The first handle is ‘id’ and match for the ‘id ‘in the grammar is E id .
So, id is replaced with the Non terminal E. the given input string can be
written as
2. $ <• E •> *<• id•> $
The parser will not consider the Non terminal as an input. So, they are
not considered in the input string. So , the string becomes
3. $ <• *<• id•> $

The next handle is ‘id’ and match for the ‘id ‘in the grammar is E id .
So, id is replaced with the Non terminal E. the given input string can be
written as
4. $ <• *<• E•> $
The parser will not consider the Non terminal as an input. So, they are
not considered in the input string. So, the string becomes
5. $ <• * •> $

The next handle is ‘*’ and match for the ‘ ‘in the grammar is E E*E.
So, id is replaced with the Non terminal E. the given input string can be
written as
6. $ E $
The parser will not consider the Non terminal as an input. So, they are
not considered in the input string. So, the string becomes
7. $ $
$ On $ means parsing successful.

Operator Parsing Algorithm

The operator precedence Parser parsing program determines the action of the
parser depending on

1. ‘a’ is top most symbol on the Stack


2. ‘b’ is the current input symbol

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 6
There are 3 conditions of ‘a’ and ‘b’ that are important for the parsing program

1. a=b=$ , the parsing is successful


2. a <• b or a = b, the parser shifts the input symbol on to the stack and
advances the input pointer to the next input symbol.
3. a •> b, parser performs the reduce action. The parser pops out elements one
by one from the stack until we find the current top of the stack element has
lower precedence than the most recently popped out terminal.

Ex: the sequence of actions taken by the parser using the stack for the input string “id * id “

Stack Input operations


$ id * id $ $ <• id, shift ‘id’ in to stack
$ id *id $ id •> *, reduce ‘id’ using E-> id
$E *id $ $ <• *, shift ‘*’ in to stack
$E* id$ * <• id , shift ‘id’ in to Stack
$E*id $ id •> $, reduce ‘id’ using E->id
$E*E $ *•> $, reduce ‘*’ using E->E*E
$E $ $=$=$, so parsing is successful

E * E

id id

Fig: parse tree for “id * id” using operator precedence parser

Advantages and Disadvantages of Operator Precedence Parsing

The following are the advantages of operator precedence parsing

1. It is simple and easy to implement parsing technique.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 7
2. The operator precedence parser can be constructed by hand after understanding the
grammar. It is simple to debug.

The following are the disadvantages of operator precedence parsing:

1. It is difficult to handle the operator like ‘-‘which can be either unary or binary and hence
different precedence’s and associativities.

2. It can parse only a small class of grammar.

3. New addition or deletion of the rules requires the parser to be re written.

4. Too many error entries in the parsing tables.

LR Parsing

Most prevalent type of bottom up parsing is LR (k) parsing. Where, L is left to right scan
of the given input string, R is Right Most derivation in reverse and K is no of input symbols as
the Look ahead.

• It is the most general non back tracking shift reduce parsing method

• The class of grammars that can be parsed using the LR methods is a proper superset of
the class of grammars that can be parsed with predictive parsers.

• An LR parser can detect a syntactic error as soon as it is possible to do so, on a left to


right scan of the input.

a1 a2 a3 ………. $ Input Buffer

OUTPUT
LR PARSING ALGORTHM

Shift GOTO
Stack LR Parsing Table

Fig: Components of LR Parsing

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 8
LR Parser Consists of

• An input buffer that contains the string to be parsed followed by a $ Symbol, used to
indicate end of input.

• A stack containing a sequence of grammar symbols with a $ at the bottom of the stack,
which initially contains the Initial state of the parsing table on top of $.

• A parsing table (M), it is a two dimensional array M[ state, terminal or Non terminal] and
it contains two parts

1. Action part
The ACTION part of the table is a two dimensional array indexed by state and the
input symbol, i.e. action [state][input], An action table entry can have one of
following four kinds of values in it. They are:
1. Shift X, where X is a State number.

2. Reduce X, where X is a Production number.

3. Accept, signifying the completion of a successful parse.

4. Error entry.

2. GO TO Part

The GO TO part of the table is a two dimensional array indexed by state and a
Non terminal, i.e. GO TO [state][Non terminal]. A GO TO entry has a state
number in the table.

• A parsing Algorithm uses the current State X, the next input symbol ‘a’ to consult the
entry at action[X][a]. it makes one of the four following actions as given below:

1. If the action[X][a]=shift Y, the parser executes a shift of Y on to the top of the stack
and advances the input pointer.

2. If the action[X][a]= reduce Y (Y is the production number reduced in the State X), if
the production is Y->β, then the parser pops 2*β symbols from the stack and push Y
on to the Stack.

3. If the action[X][a]= accept, then the parsing is successful and the input string is
accepted.

4. If the action[X][a]= error, then the parser has discovered an error and calls the error
routine.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 9
The parsing is classified in to

1. LR ( 1 )

2. Simple LR ( 1 )

3. Canonical LR ( 1 )

4. Look ahead LR ( 1 )

LR (1) Parsing

Various steps involved in the LR (1) Parsing:

1. Write the Context free Grammar for the given input string

2. Check for the Ambiguity

3. Add Augment production

4. Create Canonical collection of LR ( 0 ) items

5. Draw DFA

6. Construct the LR ( 1 ) Parsing table

7. Based on the information from the Table, with help of Stack and Parsing algorithm
generate the output.

Augment Grammar

The Augment Grammar G`, is G with a new starting symbol S` an additional production
S` S. this helps the parser to identify when to stop the parsing and announce the acceptance of
the input. The input string is accepted if and only if the parser is about to reduce by S` S. For
example let us consider the Grammar below:

E E+T|T
T T*F
F (E) | id the Augment grammar G` is Represented by

E` E
E E+T|T
T T*F
F (E) | id
NOTE: Augment Grammar is simply adding one extra production by preserving the
actual meaning of the given Grammar G.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 10
Canonical collection of LR (0) items

LR (0) items
An LR (0) item of a Grammar is a production G with dot at some position on the right
side of the production. An item indicates how much of the input has been scanned up to a given
point in the process of parsing. For example, if the Production is X YZ then, The LR (0)
items are:

1. X •AB, indicates that the parser expects a string derivable from AB.
2. X A•B, indicates that the parser has scanned the string derivable from the A and
expecting the string from Y.
3. X AB•, indicates that he parser has scanned the string derivable from AB.

If the grammar is X € the, the LR (0) item is


X •, indicating that the production is reduced one.

Canonical collection
This is the process of grouping the LR (0) items together based on the closure and Go to
operations

Closure operation
If I is an initial State, then the Closure (I) is constructed as follows:
1. Initially, add Augment Production to the state and check for the • symbol in the Right
hand side production, if the • is followed by a Non terminal then Add Productions
which are Stating with that Non Terminal in the State I.
2. If a production X α•Aβ is in I, then add Production which are starting with X in
the State I. Rule 2 is applied until no more productions added to the State I( meaning
that the • is followed by a Terminal symbol).

Ex:
0. E` E E` •E
1. E E+T LR (0) items for the Grammar is E •E+T
2. T F T •F
3. T T*F T •T*F
4. F (E) F • (E)
5. F id F • id

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 11
Closure (I0) State
Add E ` • E in I0 State
Since, the ‘•’ symbol in the Right hand side production is followed by a Non
terminal E. So, add productions starting with E in to Io state. So, the state
becomes

0. E ` •E
1. E •E+T
2. T •F
The 1st and 2nd productions are satisfies the 2nd rule. So, add
productions which are starting with E and T in I0
Note: once productions are added in the state the same production should
not added for the 2nd time in the same state. So, the state becomes
0. E` •E
1. E •E+T
2. T •F
3. T •T*F
4. F •(E)
5. F • id

Go to Operation
Go to (I0, X), where I0 is set of items and X is the grammar Symbol on which
we are moving the ‘•’ symbol. It is like finding the next state of the NFA for a give State I0 and
the input symbol is X. For example, if the production is E •E+T

Go to (I0, E) is E` •E, E E•+T

Note: Once we complete the Go to operation, we need to compute closure operation for
the output production

Go to (I0, E) is E E•+T,E` E. = Closure ({E` E•, E E•+T})

E`->.E E E`-> E.
E->.E+T
E-> E.+T
T-> .T*F

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 12
Construction of LR (1) parsing Table:

Once we have Created the canonical collection of LR (0) items, need to follow the steps
mentioned below:

If there is a transaction from one state (Ii ) to another state(Ij ) on a terminal value
then, we should write the shift entry in the action part as shown below:

A-> α•aβ a A->αa•β States ACTION GO TO

a $ A

Ii Sj
Ii Ij
Ij

If there is a transaction from one state (Ii ) to another state (Ij ) on a Non terminal value
then, we should write the subscript value of Ii in the GO TO part as shown below: part as shown
below:

A->α•Aβ A A->αA•β States ACTION GO TO

a $ A

Ii j
Ii Ij
Ij

If there is one state (Ii), where there is one production which has no transitions. Then, the
production is said to be a reduced production. These productions should have reduced entry in
the Action part along with their production numbers. If the Augment production is reducing then,
write accept in the Action part.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 13
States ACTION GO TO

1 A->αβ• a $ A

Ii r1 r1
Ii
Ii

Ex: Construct the LR (1) parsing Table for the given Grammar (G)

S aB
B bB | b

Sol: 1. Add Augment Production and insert ‘•’ symbol at the first position for every
production in G

0. S` •S
1. S •aB
2. B •bB
3. B •b

I0 State:
1. Add Augment production to the I0 State and Compute the Closure

I0 = Closure ( S` •S)
Since ‘•’ is followed by the Non terminal, add all productions starting with
S in to I0 State. So, the I0 State becomes
I0 = S` •S
S •aB

Here, in the S production ‘.’ Symbol is following terminal


value so close the state.
I1= Go to (I0, S)
S` S•
Closure( S` S•) = S` S•
Here, The Production is reduced so close the State.

I1= S` S•

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 14
I2= Go to ( I0, a) = closure (S a•B)
Here, the ‘•’ symbol is followed by The Non terminal B. So, add
the productions which are Starting B.

I2= B •bB
B •b
Here, the ‘•’ symbol in the B production is followed by the terminal value. So,
Close the State.

I2= S a•B
B •bB
B •b

I3= Go to ( I2 ,B) = Closure ( S aB• ) = S aB•

I4= Go to ( I2 , b) = closure ({B b•B, B b•})


Add productions starting with B in I4.

B •bB
B •b
The Dot Symbol is followed by the terminal value. So, close the State.

I4= B b•B
B •bB
B •b
B b•

I5= Go to (I2, b) = Closure (B b• ) = B b•

I6= Go to (I4, B) = Closure (B bB• ) = B bB•

I7 = Go to ( I4 , b) = I4

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 15
Drawing DFA:

S->aB•

S`->S•
S I3
S->•S I1 B

S->•aB
B-> b•B
B
a S->a•B
b
B->•bB
B->bB•
I0 B->•bB B->•b

B->•b B->b• b
I5
I4
I2 I4

LR Table:

ACTION GOTO
States
a B $ S B
I0 S2 1
I1 ACCEPT
I2 S4 3
I3 R1 R1 R1
I4 R3 S4/R3 R3 5
I5 R2 R2 R2

Note: if there are multiple entries in the LR (1) parsing table, then it will not accepted by the
LR(1) parser. In the above table I3 row is giving two entries for the single terminal value ‘b’ and
it is called as Shift- Reduce conflict.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 16
Shift-Reduce Conflict in LR (1) Parsing

Shift Reduce Conflict in the LR (1) parsing occurs when a state has
1. A Reduced item of the form A α• and
2. An incomplete item of the form A β•aα as shown below:

1 A-> β•a α States Action GOTO


a
2 B->b• Ij a $ A B

Ii Sj/r2 r2

Ii
Ij

Reduce - Reduce Conflict in LR (1) Parsing

Reduce- Reduce Conflict in the LR (1) parsing occurs when a state has two or more
reduced items of the form
1. A α•
2. B β• as shown below:

States Action GOTO


1 A-> α•
a $ A B
2 B->β•
Ii r1/r2 r1/ r2

Ii

SLR (1) Parsing

Various steps involved in the SLR (1) Parsing:

1. Write the Context free Grammar for the given input string

2. Check for the Ambiguity

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 17
3. Add Augment production

4. Create Canonical collection of LR ( 0 ) items

5. Draw DFA

6. Construct the SLR ( 1 ) Parsing table

7. Based on the information from the Table, with help of Stack and Parsing algorithm
generate the output.

SLR (1) Table Construction

Once we have Created the canonical collection of LR (0) items, need to follow the steps
mentioned below:

If there is a transaction from one state (Ii ) to another state(Ij ) on a terminal value
then, we should write the shift entry in the action part as shown below:

A->α•aβ a A->αa•β States ACTION GO TO


a $ A

Ii Sj
Ii Ij
Ij

If there is a transaction from one state (Ii ) to another state (Ij ) on a Non terminal value
then, we should write the subscript value of Ii in the GO TO part as shown below: part as shown
below:

A States ACTION GO TO

A->α•Aβ A->αA•β a $ A

Ii j
Ii Ij
Ij

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 18
1 If there is one state (Ii), where there is one production (A->αβ•) which has no transitions
to the next State. Then, the production is said to be a reduced production. For all
terminals X in FOLLOW (A), write the reduce entry along with their production
numbers. If the Augment production is reducing then write accept.

1 S -> •aAb
2 A->αβ•
Follow(S) = {$}
Follow (A) = (b}

States ACTION GO TO
2 A->αβ• a b $ S A

Ii r2
Ii
Ii

SLR ( 1 ) table for the Grammar

S aB
B bB | b

Follow (S) = {$}


Follow (B) = {$}

ACTION GOTO
States
a B $ S B
I0 S2 1
I1 ACCEPT
I2 S4 3
I3 R1
I4 S4 R3 5
I5 R2

Note: When Multiple Entries occurs in the SLR table. Then, the grammar is not accepted by
SLR(1) Parser.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 19
Conflicts in the SLR (1) Parsing

When multiple entries occur in the table. Then, the situation is said to be a Conflict.

Shift-Reduce Conflict in SLR (1) Parsing

Shift Reduce Conflict in the LR (1) parsing occurs when a state has
1. A Reduced item of the form A α• and Follow(A) includes the terminal value
‘a’.
2. An incomplete item of the form A β•aα as shown below:

1 A-> β•a α States Action GOTO


a
2 B->b•
Ij a $ A B

Ii Sj/r2
Ii

Reduce - Reduce Conflict in SLR (1) Parsing

Reduce- Reduce Conflict in the LR (1) parsing occurs when a state has two or more
reduced items of the form
1. A α•
2. B β• and Follow (A) ∩ Follow(B) ≠ null as shown below:
If The Grammar is
S-> αAaBa
A-> α B->
β
Follow(S)= {$}
Follow(A)={a} and Follow(B)= {a}

1 A-> α• States Action GOTO

2 B->β• a $ A B

Ii r1/r2

Ii

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 20
Canonical LR (1) Parsing

Various steps involved in the CLR (1) Parsing:

1. Write the Context free Grammar for the given input string

2. Check for the Ambiguity

3. Add Augment production

4. Create Canonical collection of LR ( 1 ) items

5. Draw DFA

6. Construct the CLR ( 1 ) Parsing table

7. Based on the information from the Table, with help of Stack and Parsing
algorithm generate the output.

LR (1) item
The LR (1) item is defined by production, position of data and a terminal symbol. The
terminal is called as look ahead symbol.

General form of LR (1) item is S->α•Aβ , $

A-> •γ, FIRST(β,$)

Rules to create canonical collection:


1. Every element of I is added to closure of I
2. If an LR (1) item [X-> A•BC, a] exists in I, and there exists a production B->b1b2…..,
then add item [B->• b1b2, z] where z is a terminal in FIRST(Ca), if it is not already
in Closure(I).keep applying this rule until there are no more elements adde.

Ex: if the Grammar is


S->CC
C->cC
C->d
Canonical collection of LR (1) items can be created as follows:

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 21
0 S`->•S (Augment Production)
1 S->•CC
2 C->•cC
3 C->•d

I0 State:

Add Augment production and compute the Closure, the look ahead symbol for the Augment
Production is $.

S`->•S, $= Closure(S`->•S, $)

The dot symbol is followed by a Non terminal S. So,


add productions starting with S in I0 State.

S->•CC, FIRST ($), using 2nd rule


S->•CC, $

The dot symbol is followed by a Non terminal C. So,


add productions starting with C in I0 State.

C->•cC, FIRST(C, $)
C->•d, FIRST(C, $)

FIRST(C) = {c, d} so, the items are

C->•cC, c/d
C->•d, c/d

The dot symbol is followed by a terminal value. So, close the I0 State. So, the
productions in the I0 are

S`->•S , $
S->•CC , $
C->•cC, c/d
C->•d , c/d

I1 = Goto ( I0, S)= S`->S•,$

I2 = Go to (I0 , C)= Closure( S-> C•C, $)

S-> C->•cC , $
C->•d,$ So, the I2 State is

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 22
S->C•C,$
C->•cC , $
C->•d,$

I3= Goto(I0,c)= Closure( C->c•C, c/d)


C->•cC, c/d
C->•d , c/d So, the I3 State is

C->c•C, c/d
C->•cC, c/d
C->•d , c/d

I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d

I5 = Goto ( I2, C)= closure(S->CC•,$)= S->CC•, $

I6= Goto ( I2, c)= closure(C->c•C , $)=


C->•cC, $
C->•d , $ S0, the I6 State is

C->c•C , $
C->•cC , $
C->•d,$

I7 = Go to (I2 , d)= Closure(C->d•,$ ) = C->d•, $

Goto(I3, c)= closure(C->•cC, c/d)= I3.

I8= Go to (I3 , C)= Closure(C->cC•, c/d) = C->cC•, c/d

Go to (I3 , c)= Closure(C->c•C, c/d) = I3

Go to (I3 , d)= Closure(C->d•, c/d) = I4

I9= Go to (I6 , C)= Closure(C->cC• , $) = C->cC• ,


$ Go to (I6 , c)= Closure(C->c•C , $) = I6

Go to (I6 , d)= Closure(C->d•,$ ) = I7

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 23
Drawing DFA for the above LR (1) items

S`->S•,$
S->CC•, $

S I1 C I5 C->cC• , $

0 S`-> •S , $ I
S->C•C,$ C->c•C , $ 9

1 S->•CC , $ C C->•cC , $ c C->•cC , $c


2C- C->•d,$ C->•d,$
>•cC,c/d d I6
3 C->•d
I2 I6 I7

I0 c d

d
C->c•C, c/d C->d•, $
C->d•, C->•cC, c/d
c/d C->•d , c/d
C I7
I4
d I3 c C->cC•,

I4 I3 c/d

I8

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 24
Construction of CLR (1) Table

Rule1: if there is an item [A->α•Xβ,b] in Ii and goto(Ii,X) is in Ij then action [Ii][X]=


Shift j, Where X is Terminal.
Rule2: if there is an item [A->α•, b] in Ii and (A≠S`) set action [Ii][b]= reduce along with
the production number.
Rule3: if there is an item [S`->S•, $] in Ii then set action [Ii][$]= Accept.
Rule4: if there is an item [A->α•Xβ,b] in Ii and go to(Ii,X) is in Ij then goto [Ii][X]= j,
Where X is Non Terminal.

ACTION GOTO
States
c D $ S C

I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2

Tab: LR (1) Table

LALR (1) Parsing

The CLR Parser avoids the conflicts in the parse table. But it produces more number of
States when compared to SLR parser. Hence more space is occupied by the table in the memory.
So LALR parsing can be used. Here, the tables obtained are smaller than CLR parse table. But it
also as efficient as CLR parser. Here LR (1) items that have same productions but different look-
aheads are combined to form a single set of items.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 25
For example, consider the grammar in the previous example. Consider the states I4 and I7
as given below:
I4= Goto( I0, d)= Colsure( C->d•, c/d) = C->d•, c/d

I7 = Go to (I2 , d)= Closure(C->d•,$ ) = C->d•, $

These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I47.

Similarly the states I3 and I6 differing only in their look-aheads as given below:
I3= Goto(I0,c)=
C->c•C, c/d
C->•cC, c/d
C->•d , c/d

I6= Goto ( I2, c)=


C->c•C , $
C->•cC , $
C->•d,$

These states are differing only in the look-aheads. They have the same productions.
Hence these states are combined to form a single state called as I36.

Similarly the States I8 and I9 differing only in look-aheads. Hence they combined to form
the state I89.

ACTION GOTO
States
c D $ S C

I0 S36 S47 1 2
I1 ACCEPT
I2 S36 S47 5
I36 S36 S47 89

I47 R3 R3 R3 5
I5 R1
I89 R2 R2 R2

Tab: LALR Table

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 26
Conflicts in the CLR (1) Parsing

When multiple entries occur in the table. Then, the situation is said to be a Conflict.

Shift-Reduce Conflict in CLR (1) Parsing

Shift Reduce Conflict in the CLR (1) parsing occurs when a state has
3. A Reduced item of the form A α•, a and
4. An incomplete item of the form A β•aα as shown below:

1 A-> β•a α , States Action GOTO


$ a
Ij a $ A B
2 B->b• ,a
Ii Sj/r2
Ii

Reduce - Reduce Conflict in CLR (1) Parsing

Reduce- Reduce Conflict in the CLR (1) parsing occurs when a state has two or more
reduced items of the form
3. A α•
4. B β• If two productions in a state (I) reducing on same look ahead symbol
as shown below:

1 A-> α• ,a
States Action GOTO
2 B->β•,a
a $ A B

Ii r1/r2

Ii

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 27
String Acceptance using LR Parsing:
Consider the above example, if the input String is cdd
ACTION GOTO
States
c D $ S C
I0 S3 S4 1 2
I1 ACCEPT
I2 S6 S7 5
I3 S3 S4 8
I4 R3 R3 5
I5 R1
I6 S6 S7 9
I7 R3
I8 R2 R2
I9 R2

0 S`->•S (Augment Production)


1 S->•CC
2 C->•cC
3 C->•d

STACK INPUT ACTION

$0 cdd$ Shift S3
$0c3 dd$ Shift S4
$0c3d4 d$ Reduce with R3,C->d, pop
2*β symbols from the stack
$0c3C d$ Goto ( I3, C)=8Shift S6
$0c3C8 d$ Reduce with R2 ,C->cC, pop
2*β symbols from the stack
$0C d$ Goto ( I0, C)=2
$0C2 d$ Shift S7
$0C2d7 $ Reduce with R3,C->d, pop
2*β symbols from the stack
$0C2C $ Goto ( I2, C)=5
$0C2C5 $ Reduce with R1,S->CC, pop
2*β symbols from the stack
$0S $ Goto ( I0, S)=1
$0S1 $ Accept

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 28
An LR parser will detect an error when it consults the parsing action table and find a blank or
error entry. Errors are never detected by consulting the goto table. An LR parser will detect an
error as soon as there is no valid continuation for the portion of the input thus far scanned. A
canonical LR parser will not make even a single reduction before announcing the error. SLR and
LALR parsers may make several reductions before detecting an error, but they will never shift an
erroneous input symbol onto the stack.
Panic-mode Error Recovery :We can implement panic-mode error recovery by scanning down
the stack until a state s with a goto on a particular nonterminal A is found. Zero or more input
symbols are then discarded until a symbol a is found that can legitimately follow A. The parser
then stacks the state GOTO(s, A) and resumes normal parsing. The situation might exist where
there is more than one choice for the nonterminal A. Normally these would be nonterminals
representing major program pieces, e.g. an expression, a statement, or a block. For example, if A
is the nonterminal stmt, a might be semicolon or }, which marks the end of a statement sequence.
This method of error recovery attempts to eliminate the phrase containing the syntactic error.
The parser determines that a string derivable from A contains an error. Part of that string has
already been processed, and the result of this processing is a sequence of states on top of the
stack. The remainder of the string is still in the input, and the parser attempts to skip over the
remainder of this string by looking for a symbol on the input that can legitimately follow A. By
removing states from the stack, skipping over the input, and pushing GOTO(s, A) on the stack,
the parser pretends that if has found an instance of A and resumes normal parsing.
Phrase-level Recovery: Phrase-level recovery is implemented by examining each error entry in
the LR action table and deciding on the basis of language usage the most likely programmer
error that would give rise to that error. An appropriate recovery procedure can then be
constructed; presumably the top of the stack and/or first input symbol would be modified in a
way deemed appropriate for each error entry. In designing specific error-handling routines for an
LR parser, we can fill in each blank entry in the action field with a pointer to an error routine that
will take the appropriate action selected by the compiler designer. The actions may include
insertion or deletion of symbols from the stack or the input or both, or alteration and
transposition of input symbols. We must make our choices so that the LR parser will not get into
an infinite loop. A safe strategy will assure that at least one input symbol will be removed or
shifted eventually, or that the stack will eventually shrink if the end of the input has been
reached. Popping a stack state that covers a nonterminal should be avoided, because this
modification eliminates from the stack a construct that has already been successfully parsed.
Error productions
Some common errors are known to the compiler designers that may occur in the code. In
addition, the designers can create augmented grammar to be used, as productions that generate
erroneous constructs when these errors are encountered.
Global correction
The parser considers the program in hand as a whole and tries to figure out what the program is
intended to do and tries to find out a closest match for it, which is error-free. When an erroneous
input (statement) X is fed, it creates a parse tree for some closest error-free statement Y. This
may allow the parser to make minimal changes in the source code, but due to the complexity
(time and space) of this strategy, it has not been implemented in practice yet.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 29
Handing Ambiguous grammar

Ambiguity

. A Grammar can have more than one parse tree for a string

Consider grammar

string string + string

| string - string

|0|1|.|9

. String 9-5+2 has two parse trees

A grammar is said to be an ambiguous grammar if there is some string that it can generate in
more than one way (i.e., the string has more than one parse tree or more than one leftmost
derivation). A language is inherently ambiguous if it can only be generated by ambiguous
grammars.

For example, consider the following grammar:

string string + string

| string - string

|0|1|.|9

In this grammar, the string 9-5+2 has two possible parse trees as shown in the next slide.

Consider the parse trees for string 9-5+2, expression like this has more than one parse tree. The
two trees for 9-5+2 correspond to the two ways of parenthesizing the expression: (9-5)+2 and 9-
(5+2). The second parenthesization gives the expression the value 2 instead of 6.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 30
Ambiguity is problematic because meaning of the programs can be incorrect

. Ambiguity can be handled in several ways

- Enforce associativity and precedence

- Rewrite the grammar (cleanest way)

There are no general techniques for handling ambiguity

. It is impossible to convert automatically an ambiguous grammar to an unambiguous one

Ambiguity is harmful to the intent of the program. The input might be deciphered in a way which
was not really the intention of the programmer, as shown above in the 9-5+2 example. Though
there is no general technique to handle ambiguity i.e., it is not possible to develop some feature
which automatically identifies and removes ambiguity from any grammar. However, it can be
removed, broadly speaking, in the following possible ways:-

1) Rewriting the whole grammar unambiguously.

2) Implementing precedence and associatively rules in the grammar. We shall discuss this
technique in the later slides.

If an operand has operator on both the sides, the side on which operator takes this operand is the
associativity of that operator

. In a+b+c b is taken by left +

. +, -, *, / are left associative

. ^, = are right associative

. Grammar to generate strings with right associative operators right à letter = right | letter letter
a| b |.| z

A binary operation * on a set S that does not satisfy the associative law is called non-
associative. A left-associative operation is a non-associative operation that is conventionally
evaluated from left to right i.e., operand is taken by the operator on the left side.

For example,

6*5*4 = (6*5)*4 and not 6*(5*4)

6/5/4 = (6/5)/4 and not 6/(5/4)

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 31
A right-associative operation is a non-associative operation that is conventionally evaluated from
right to left i.e., operand is taken by the operator on the right side.

For example,

6^5^4 => 6^(5^4) and not (6^5)^4)

x=y=z=5 => x=(y=(z=5))

Following is the grammar to generate strings with left associative operators. (Note that this is left
recursive and may go into infinite loop. But we will handle this problem later on by making it
right recursive)

left left + letter | letter

letter a | b | .... | z

Yacc (Yet Another Compiler-Compiler) is a computer program for the Unix operating system. It
is a Look Ahead Left-to-Right (LALR) parser generator, generating a parser, the part of a
compiler that tries to make syntactic sense of the source code, specifically a LALR parser, based
on an analytic grammar written in a notation similar to Backus–Naur Form (BNF). Yacc itself
used to be available as the default parser generator on most Unix systems, though it has since
been supplanted[citation needed] as the default by more recent, largely compatible, programs.
You provide the input of a grammar specification and it generates an LALR(1) parser to
recognize sentences in that grammar. yacc stands for "yet another compiler compiler" and it is
probably the most common of the LALR tools out there
How It Works
yacc is designed for use with C code and generates a parser written in C. The parser is
configured for use in conjunction with a lex-generated scanner and relies on standard shared
features (token types, yylval, etc.) and calls the function yylex as a scanner coroutine. You
provide a grammar specification file, which is traditionally named using a .y extension. You
invoke yacc on the .y file and it creates the y.tab.h and y.tab.c files containing a thousand or so
lines of intense C code that implements an efficient LALR(1) parser for your grammar, including
the code for the actions you specified. The file provides an extern function yyparse.y that will
attempt to successfully parse a valid sentence. You compile that C file normally, link with the
rest of your code, and you have a parser! By default, the parser reads from stdin and writes to
stdout, just like a lex-generated scanner does

. % yacc myFile.y creates y.tab.c of C code for parser


% gcc -c y.tab.c compiles parser code
% gcc –o parse y.tab.o lex.yy.o -ll -ly link parser, scanner, libraries
% parse invokes parser, reads from stdin
The Makefiles we provide for the projects will execute the above compilation steps for you, but
it is worthwhile to understand the steps required.
yacc File Format Your input file is organized as follows (note the intentional similarities to lex):
%{ Declarations %} Definitions %% Productions %% User subroutines The optional
Declarations and User subroutines sections are used for ordinary C code that you want copied
verbatim to the generated C file, declarations are copied to the top of the file, user subroutines to

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 32
the bottom. The optional Definitions section is where you configure various parser features such
as defining token codes, establishing operator precedence and associativity, and setting up the
global variables used to communicate between the scanner and parser. The required Productions
section is where you specify the grammar rules. As in lex, you can associate an action with each
pattern (this time a production), which allows you to do whatever processing is needed as you
reduce using that production.
Example Let's look at a simple, but complete, specification to get our bearings. Here is a yacc
input file for a simple calculator that recognizes and evaluates binary postfix expressions using a
stack.
%{
#include <stdio.h>
#include <assert.h>
static int Pop();
static int Top();
static void Push(int val);
%}
%token T_Int
%%
S : S E '\n' { printf("= %d\n", Top()); }
|
;
E : E E '+' { Push(Pop() + Pop()); }
| E E '-' { int op2 = Pop(); Push(Pop() - op2); }
| E E '*' { Push(Pop() * Pop()); }
| E E '/' { int op2 = Pop(); Push(Pop() / op2); }
| T_Int { Push(yylval); }
;
%%
static int stack[100], count = 0;
static int Pop() {
assert(count > 0);
return stack[--count];
}
static int Top() {
assert(count > 0);
return stack[count-1];
}
static void Push(int val) {
assert(count < sizeof(stack)/sizeof(*stack));
stack[count++] = val;
}
int main() {
return yyparse();
}

A few things worth pointing out in the above example:


• All token types returned from the scanner must be defined using %token in the
definitions section. This establishes the numeric codes that will be used by the
scanner to tell the parser about each token scanned. In addition, the global
variable yylval is used to store additional attribute information about the
lexeme itself.
• For each rule, a colon is used in place of the arrow, a vertical bar separates the
various productions, and a semicolon terminates the rule. Unlike lex, yacc pays

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 33
no attention to line boundaries in the rules section, so you’re free to use lots of
space to make the grammar look pretty.
• Within the braces for the action associated with a production is just ordinary C
code. If no action is present, the parser will take no action upon reducing that
production.
• The first rule in the file is assumed to identify the start symbol for the grammar.
• yyparse is the function generated by yacc. It reads input from stdin,
attempting to work its way back from the input through a series of reductions
back to the start symbol. The return code from the function is 0 if the parse was
successful and 1 otherwise. If it encounters an error (i.e. the next token in the
input stream cannot be shifted), it calls the routine yyerror, which by default
prints the generic "parse error" message and quits.

In order to try out our parser, we need to create the scanner for it. Here is the lex file
we used:
%{
#include "y.tab.h"
%}
%%
[0-9]+ { yylval = atoi(yytext); return T_Int;}
[-+*/\n] { return yytext[0];}
. { /* ignore everything else */ }
Given the above specification, yylex will return the ASCII representation of the
calculator operators, recognize integers , and ignore all other characters. When it
assembles a series of digits into an integer, it converts to a numeric value and stores in
yylval (the global reserved for passing attributes from the scanner to the parser). The
token type T_Int is returned.
In order to tie this all together, we first run yacc on the grammar specification
to generate the y.tab.c and y.tab.h files, and then run lex/flex on the scanner
specification to generate the lex.yy.c file.
Compile the two .c files and link them together, and voila— a calculator is born! Here is the
Makefile we could use:
calc: lex.yy.o y.tab.o
gcc -o calc lex.yy.o y.tab.o -ly -ll
lex.yy.c: calc.l y.tab.c
flex calc.l
y.tab.c: calc.y

Tokens, Productions, and Actions


By default, lex and yacc agree to use the ASCII character codes for all single-char
tokens without requiring explicit definitions. For all multi-char tokens, there needs to be
an agreed-upon numbering system, and all of these tokens need to be specifically
defined. The %token directive establishes token names and assigns numbers.
%token T_Int T_Double T_String T_While
The above line would be included in the definitions section. It is translated by yacc into
C code as a sequence of #defines for T_Int, T_Double, and so on, using the numbers
257 and higher. These are the codes returned from yylex as each multicharacter token
is scanned and identified. The C declarations for the token codes are exported in the
generated y.tab.h header file. #include that file in other modules (in the scanner, for
example) to stay in sync with the parser-generated definitions.
Productions and their accompanying actions are specified using the following format:
5
left_side : right_side1 { action1 }

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 34
| right_side2 { action2 }
| right_side3 { action3 }
;
The left side is the non-terminal being described. Non-terminals are named like typical
identifiers: using a sequence of letters and/or digits, starting with a letter. Nonterminals
do not have to be defined before use, but they do need to be defined
eventually. The first non-terminal defined in the file is assumed to the start symbol for
the grammar.
Each right side is a valid expansion for the left side non-terminal. Vertical bars separate
the alternatives, and the entire list is punctuated by a semicolon. Each right side is a list
of grammar symbols, separated by white space. The symbols can either be nonterminals
or terminals. Terminal symbols can either be individual character constants,
e.g. 'a' or token codes defined using %token such as T_While.
The C code enclosed in curly braces after each right side is the associated action.
Whenever the parser recognizes that production and is ready to reduce, it executes the
action to process it. When the entire right side is assembled on top of the parse stack
and the lookahead indicates a reduce is appropriate, the associated action is executed
right before popping the handle off the stack and following the goto for the left side
non-terminal. The code you include in the actions depends on what processing you
need. The action might be to build a section of the parse tree, evaluate an arithmetic
expression, declare a variable, or generate code to execute an assignment statement.
Although it is most common for actions to appear at the end of the right side, it is also
possible to place actions in-between grammar symbols. Those actions will be executed
at that point when the symbols to the left are on the stack and the symbols to the right
are coming up. These embedded actions can be a little tricky because they require the
parser to commit to the current production early, more like a predictive parser, and can
introduce conflicts if there are still open alternatives at that point in the parse.
Symbol Attributes
The parser allows you to associate attributes with each grammar symbol, both terminals
and non-terminals. For terminals, the global variable yylval is used to communicate
the particulars about the token just scanned from the scanner to the parser. For nonterminals,
you can explicitly access and set their attributes using the attribute stack.
By default, YYSTYPE (the attribute type) is just an integer. Usually you want to different
information for various symbol types, so a union type can be used instead. You indicate
what fields you want in the union via the %union directive.
6
%union {
int intValue;
double doubleValue;
char *stringValue;
}
The above line would be included in the definitions section. It is translated by yacc into
C code as a YYSTYPE typedef for a new union type with the above fields. The global
variable yylval is of this type, and parser stores variables of this type on the parser
stack, one for each symbol currently on the stack.
When defining each token, you can identify which field of the union is applicable to this
token type by preceding the token name with the fieldname enclosed in angle brackets.
The fieldname is optional (for example, it is not relevant for tokens without attributes).
%token <intValue>T_Int <doubleValue>T_Double T_While T_Return
To set the attribute for a non-terminal, use the %type directive, also in the definitions
section. This establishes which field of the union is applicable to instances of the nonterminal:
%type<intValue> Integer IntExpression
To access a grammar symbol's attribute from the parse stack, there are special variables

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 35
available for the C code within an action. $n is the attribute for the nth symbol of the
current right side, counting from 1 for the first symbol. The attribute for the nonterminal
on the left side is denoted by $$. If you have set the type of the token or nonterminal,
then it is clear which field of the attributes union you are accessing. If you have
not set the type (or you want to overrule the defined field), you can specify with the
notation $<fieldname>n. A typical use of attributes in an action might be to gather the
attributes of the various symbols on the right side and use that information to set the
attribute for the non-terminal on the left side.
A similar mechanism is used to obtain information about symbol locations. For each
symbol on the stack, the parser maintains a variable of type YYLTYPE, which is a
structure containing four members: first line, first column, last line, and last column. To
obtain the location of a grammar symbol on the right side, you simply use the notation
@n, completely parallel to $n. The location of a terminal symbol is furnished by the
lexical analyzer via the global variable yylloc. During a reduction, the location of the
non-terminal on the left side is automatically set using the combined location of all
symbols in the handle that is being reduced.
Error handling in YACC
When a yacc-generated parser encounters an error (i.e. the next input token cannot be shifted
given the sequence of things so far on the stack), it calls the default yyerror routine to print a
generic "parse error" message and halt parsing. However, quitting in response to the very first
error is not particularly helpful! yacc supports a form of error re-synchronization that allows you
to define where in the stream to give up on an unsuccessful parse and how far to scan ahead and
try to cleanup to allow the parse to restart. The special error token can be used in the right side of
a production to mark a context in which to attempt error recovery. The usual use is to add the
error token possibly followed by a sequence of one or more synchronizing tokens, which tell the
parser to discard any tokens until it sees that "familiar" sequence that allows the parser to
continue. A simple and useful strategy might be simply to skip the rest of the current input line
or current statement when an error is detected. When an error is encountered (and reported via
yyerror,) the parser will discard any partially parsed rules (i.e. pop stacks from the parse stack)
until it finds one in which it can shift an error token. It then reads and discards input tokens until
it finds one that can follow the error token in that production. For example: Var : Modifiers Type
IdentifierList ';' | error ';' ; The second production allows for handling errors encountered when
trying to recognize Var by accepting the alternate sequence error followed by a semicolon. What
happens if the parser in the in the middle of processing the IdentifierList when it encounters an
error? The error recovery rule, interpreted strictly, applies to the precise sequence of an error and
a semicolon. If an error occurs in the middle of an IdentifierList, there are Modifiers and a Type
and what not on the stack, which doesn't seem to fit the pattern. However, yacc can force the
situation to fit the rule, by discarding the partially processed rules (i.e. popping states from the
stack) until it gets back to a state in which the error token is acceptable (i.e. all the way back to
the state at which it started before matching the Modifiers). At this point the error token can be
shifted. Then, if the lookahead token couldn’t possibly be shifted, the parser reads tokens,
discarding them until it finds a token that is acceptable. In this example, yacc reads and discards
input until the next semi-colon so that the error rule can apply. It then reduces that to Var, and
continues on from there. This basically allowed for a mangled variable declaration to be ignored
up to the terminating semicolon, which was a reasonable attempt at getting the parser back on
track. Note that if a specific token follows the error symbol it will discard until it finds that
token, otherwise it will discard until it finds any token in the lookahead set for the nonterminal
(for example, anything that can follow Var in the example above). 10 In our postfix calculator
from the beginning of the handout, we can add an error production to recover from a malformed
expression by discarding the rest of the line up to the newline and allow the next line to be
parsed normally: S : S E '\n' { printf("= %d\n", Top()); } | error '\n' { printf("Error! Discarding
entire line.\n"); } | ; Like any other yacc rule, one that contains an error can have an associated
action. It would be typical at this point to clean up after the error and other necessary

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 36
housekeeping so that the parse can resume. Where should one put error productions? It’s more of
an art than a science. Putting error tokens fairly high up tends to protect you from all sorts of
errors by ensuring there is always a rule to which the parser can recover. On the other hand, you
usually want to discard as little input as possible before recovering, and error tokens at lower
level rules can help minimize the number of partially matched rules that are discarded during
recovery. One reasonable strategy is to add error tokens fairly high up and use punctuation as the
synchronizing tokens. For example, in a programming language expecting a list of declarations,
it might make sense to allow error as one of the alternatives for a list entry. If punctuation
separates elements of a list, you can use that punctuation in error rules to help find
synchronization points—skipping ahead to the next parameter or the next statement in a list.
Trying to add error actions at too low a level (say in expression handling) tends to be more
difficult because of the lack of structure that allows you to determine when the expression ends
and where to pick back up. Sometimes adding error actions into more than one level may
introduce conflicts into the grammar because more than one error action is a possible recovery.
Another mechanism: deliberately add incorrect productions into the grammar specification,
allow the parser to help you recognize these illegal forms, and then use the action to handle the
error manually. For example, let's say you're writing a Java compiler and you know some poor C
programmer is going to forget that you can't specify the size in a Java array declaration. You
could add an alternate array declaration that allows for a size, but knowing it was illegal, the
action for this production reports an error and discards the erroneous size token. Most compilers
tend to focus their efforts on trying to recover from the most common error situations (forgetting
a semicolon, mistyping a keyword), but don't put a lot of effort into dealing with more wacky
inputs.
Error recovery is largely a trial and error process. For fun, you can experiment with your favorite
compilers to see how good a job they do at recognizing errors (they are expected to hit this one
perfectly), reporting errors clearly (less than perfect), and gracefully recovering (far from
perfect).

Semantics
Semantics of a language provide meaning to its constructs, like tokens and syntax structure.
Semantics help interpret symbols, their types, and their relations with each other. Semantic
analysis judges whether the syntax structure constructed in the source program derives any
meaning or not.

CFG + semantic rules = Syntax Directed Definitions


For example:

int a = “value”;
should not issue an error in lexical and syntax analysis phase, as it is lexically and structurally
correct, but it should generate a semantic error as the type of the assignment differs. These rules
are set by the grammar of the language and evaluated in semantic analysis. The following tasks
should be performed in semantic analysis:

Scope resolution
Type checking
Array-bound checking
Semantic Errors
We have mentioned some of the semantics errors that the semantic analyzer is expected to
recognize:

Type mismatch
Undeclared variable
Reserved identifier misuse.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 37
Multiple declaration of variable in a scope.
Accessing an out of scope variable.
Actual and formal parameter mismatch.
Attribute Grammar
Attribute grammar is a special form of context-free grammar where some additional information
(attributes) are appended to one or more of its non-terminals in order to provide context-sensitive
information. Each attribute has well-defined domain of values, such as integer, float, character,
string, and expressions.

Attribute grammar is a medium to provide semantics to the context-free grammar and it can help
specify the syntax and semantics of a programming language. Attribute grammar (when viewed
as a parse-tree) can pass values or information among the nodes of a tree.

Example:

E → E + T { E.value = E.value + T.value }


The right part of the CFG contains the semantic rules that specify how the grammar should be
interpreted. Here, the values of non-terminals E and T are added together and the result is copied
to the non-terminal E.

Semantic attributes may be assigned to their values from their domain at the time of parsing and
evaluated at the time of assignment or conditions. Based on the way the attributes get their
values, they can be broadly divided into two categories : synthesized attributes and inherited
attributes.

Synthesized attributes
These attributes get values from the attribute values of their child nodes. To illustrate, assume the
following production:

S → ABC
If S is taking values from its child nodes (A,B,C), then it is said to be a synthesized attribute, as
the values of ABC are synthesized to S.

As in our previous example (E → E + T), the parent node E gets its value from its child node.
Synthesized attributes never take values from their parent nodes or any sibling nodes.

Inherited attributes
In contrast to synthesized attributes, inherited attributes can take values from parent and/or
siblings. As in the following production,

S → ABC
A can get values from S, B and C. B can take values from S, A, and C. Likewise, C can take
values from S, A, and B.

Expansion : When a non-terminal is expanded to terminals as per a grammatical rule

Inherited Attributes
Reduction : When a terminal is reduced to its corresponding non-terminal according to grammar
rules. Syntax trees are parsed top-down and left to right. Whenever reduction occurs, we apply
its corresponding semantic rules (actions).

Semantic analysis uses Syntax Directed Translations to perform the above tasks.

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 38
Semantic analyzer receives AST (Abstract Syntax Tree) from its previous stage (syntax
analysis).

Semantic analyzer attaches attribute information with AST, which are called Attributed AST.

Attributes are two tuple value, <attribute name, attribute value>

For example:

int value = 5;
<type, “integer”>
<presentvalue, “5”>
For every production, we attach a semantic rule.

S-attributed SDT
If an SDT uses only synthesized attributes, it is called as S-attributed SDT. These attributes are
evaluated using S-attributed SDTs that have their semantic actions written after the production
(right hand side).

S-attributed SDT
As depicted above, attributes in S-attributed SDTs are evaluated in bottom-up parsing, as the
values of the parent nodes depend upon the values of the child nodes.

L-attributed SDT
This form of SDT uses both synthesized and inherited attributes with restriction of not taking
values from right siblings.

In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling nodes. As
in the following production

S → ABC
S can take values from A, B, and C (synthesized). A can take values from S only. B can take
values from S and A. C can get values from S, A, and B. No non-terminal can get values from
the sibling to its right.

Attributes in L-attributed SDTs are evaluated by depth-first and left-to-right parsing manner.

L-attributed SDT

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 39
We may conclude that if a definition is S-attributed, then it is also L-attributed as L-attributed
definition encloses S-attributed definitions.

. Check semantics
. Error reporting
. Disambiguate overloaded operators
. Type coercion
. Static checking
- Type checking
-Control flow checking
- Uniqueness checking
- Name checks

Notes Compiled for M.Sc (Computer Science )Osmania University Mrs. Azmath Mubeen Page 40

You might also like