Lecture#8 - Chap#2 (Syntax Directed Translator (Part-IV) )

Lecture# 08
Compiler Construction
A Simple Syntax-Directed Translator,
(Part-IV)
by Safdar Hussain
Topics
• FIRST & FIRST Sets, Left-Recursion, Left Factoring
• Translator for Simple Expressions, Adding a Lexical Analyzer
• Token Attributes, Generic Instructions for Stack Manipulation
Overview
• This chapter contains introductory material
• Building a simple compiler
– Syntax Definition
– Syntax-Directed Translation
– Parsing
– A Translator for Simple Expressions
– The Lexical Analyzer
Safdar Hussain, Khwaja Fareed University, RYK 2

Top-down Parsing (Key Points)
• In general, the selection of a production for a nonterminal
may involve trial-and-error that is, we may have to try a
production and backtrack to try another production if the first
is found to be unsuitable.
• A production is unsuitable if, after using the production, we
cannot complete the tree to match the input string.
• A special case of parsing namely the predictive parsing does
not need backtracking.
Grammar

Predictive Parsing
• Recursive descent parsing is a top-down parsing
method
– Every nonterminal has one (recursive) procedure
responsible for parsing the nonterminal’s syntactic
category of input tokens
– When a nonterminal has multiple productions, each
production is implemented in a branch of a selection
statement based on input look-ahead information
• Predictive parsing is a special form of recursive
descent parsing where we use one lookahead token
to unambiguously determine the parse operations

FIRST
• FIRST() is the set of terminals that appear as the
first symbols of one or more strings generated from 
• If  is Є or can generate Є, then Є is also in FIRST()
type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
FIRST(simple) = { integer, char, num }
FIRST(^ id) = { ^ }
FIRST(type) = { integer, char, num, ^, array }

Using FIRST
We use FIRST to write a predictive parser as follows
procedure rest();
expr  term rest begin
rest  + term rest if lookahead in FIRST(+ term rest) then
match(„+‟); term(); rest()
| - term rest else if lookahead in FIRST(- term rest) then
| match(„-‟); term(); rest()
else return
end;
When a nonterminal A has two (or more) productions as in

A
|
Then FIRST () and FIRST() must be disjoint for
predictive parsing to work
Left Factoring
When more than one production for nonterminal A starts
with the same symbols, the FIRST sets are not disjoint
stmt  if expr then stmt

| if expr then stmt else stmt
We can use left factoring to fix the problem
stmt  if expr then stmt opt_else

opt_else  else stmt
|

Left Recursion
When a production for nonterminal A starts with a
self reference then a predictive parser loops forever
AA
|
|
We can eliminate left recursive productions by systematically
rewriting the grammar using right recursive productions
AR A   A’
|R OR |  A’
RR A’  A’
| |

A Translator for Simple Expressions
expr  expr + term { print(“+”) }
expr  expr - term { print(“-”) }
expr  term
term  0 { print(“0”) }
term  1 { print(“1”) }
… …
term  9 { print(“9”) }
After left recursion elimination:

expr  term rest
rest  + term { print(“+”) } rest | - term { print(“-”) } rest | 
term  0 { print(“0”) }
term  1 { print(“1”) }
…
term  9 { print(“9”) }

main()
Input: 1+2 { lookahead = getchar();
expr();
}
expr()
{ term();
while (1) /* optimized by inlining rest()
expr  term rest and removing recursive calls */
{ if (lookahead == „+‟)
{ match(„+‟); term(); putchar(„+‟);rest();
rest  + term { print(“+”) } rest }
| - term { print(“-”) } rest else if (lookahead == „-‟)
{ match(„-‟); term(); putchar(„-‟);rest();
| }
else break;
}
}
term()
term  0 { print(“0”) } { if (isdigit(lookahead))
term  1 { print(“1”) } { putchar(lookahead); match(lookahead);
}
… else error();
term  9 { print(“9”) } }
match(terminal t)
{ if (lookahead == t)
lookahead = getchar();
else error();
}
error()
{ printf(“Syntax error\n”);
exit(1); 10
}
Another way to Code-1

Another way to Code-2
Eliminating Tail recursion in the procedure rest (Previous page)

Adding a Lexical Analyzer
• Typical tasks of the lexical analyzer:
– Remove white space and comments
– Encode constants as tokens
– Recognize keywords
– Recognize identifiers

Adding a Lexical Analyzer Cont…
– Remove white space and comments
Skipping white space

The Lexical Analyzer
Lexical analyzer
y := 31 + 28*x
lexan()
<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>
token
tokenval Parser
(token attribute) parse()

Token Attributes
factor  ( expr )
| num { print(num.value) }
#define NUM 256 /* token returned by lexan */
factor()
{ if (lookahead == „(„)
{ match(„(„); expr(); match(„)‟);
}
else if (lookahead == NUM)
{ printf(“ %d “, tokenval); match(NUM);
}
else error();
}

Generic Instructions for Stack
Manipulation
push v push constant value v onto the stack
rvalue l push contents of data location l
lvalue l push address of data location l
pop discard value on top of the stack
:= the r-value on top is placed in the l-value below it
and both are popped
copy push a copy of the top value on the stack
+ add value on top with value below it
pop both and push result
- subtract value on top from value below it
pop both and push result
*, /, … ditto for other arithmetic operations
<, &, … ditto for relational and logical operations
Generic Control Flow Instructions
label l label instruction with l
goto l jump to instruction labeled l
gofalse l pop the top value, if zero then jump to l
gotrue l pop the top value, if nonzero then jump to l
halt stop execution
jsr l jump to subroutine labeled l, push return address
return pop return address and return to caller

L-value and R-value
What is the difference between left and right side identifier?
L-value Vs. R-value of an identifier
I:=5; L - Location
I:=I+1; R – Contents
The right side specifies an integer value, while left side
specifies where the value is to be stored.
Usually,
r-values are what we think as values
l-values are locations.

Stack manipulation
push v push v onto the stack
rvalue l push contents on data location l
lvalue l push address of data location l
pop throw away value on top of the stack
:= the r-value on top is placed in the l-value
below it and both are popped
copy push a copy of the top on the stack

Translation of Expressions
day = (1461*y) mod 4 + (153*m +2 ) mod 5 + d
lvalue day push 2
push 1461 + 2 day
rvalue y push 5 1 y
* mod 2 m
push 4 +
-3 d
mod rvalue d
...
push 153 +
rvalue m :=
*

day=2
y=1
Translation of Expressions (2)
m=2 Day = (1461*y) mod 4 + (153*m +2 ) mod 5 + d
d=-3 2 2 2 2 2 2 2
1461 1461 1461 1461 1 1
1 4 153

day=2
y=1
m=2 Day = (1461*y) mod 4 + (153*m +2 ) mod 5 + d
d=-3 2 2 2 2 2 2 2
1 1 1 1 1 1 4
153 306 306 308 308 3
2 2 5

day=2
y=1
m=2 Day = (1461*y) mod 4 + (153*m +2 ) mod 5 + d
d=-3 2 2
4 1
-3

The End
25

Lecture#8 - Chap#2 (Syntax Directed Translator (Part-IV) )

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture#8 - Chap#2 (Syntax Directed Translator (Part-IV) )

Uploaded by

Copyright:

Available Formats

Lecture# 08

Safdar Hussain, Khwaja Fareed University, RYK 2

Safdar Hussain, Khwaja Fareed University, RYK 3

Safdar Hussain, Khwaja Fareed University, RYK 4

Safdar Hussain, Khwaja Fareed University, RYK 5

When a nonterminal A has two (or more) productions as in

stmt  if expr then stmt

We can use left factoring to fix the problem

stmt  if expr then stmt opt_else

Safdar Hussain, Khwaja Fareed University, RYK 7

Safdar Hussain, Khwaja Fareed University, RYK 8

After left recursion elimination:

Safdar Hussain, Khwaja Fareed University, RYK 9

Safdar Hussain, Khwaja Fareed University, RYK 11

Eliminating Tail recursion in the procedure rest (Previous page)

Safdar Hussain, Khwaja Fareed University, RYK 12

Safdar Hussain, Khwaja Fareed University, RYK 13

Skipping white space

Safdar Hussain, Khwaja Fareed University, RYK 14

Safdar Hussain, Khwaja Fareed University, RYK 15

#define NUM 256 /* token returned by lexan */

Safdar Hussain, Khwaja Fareed University, RYK 16

Safdar Hussain, Khwaja Fareed University, RYK 18

Safdar Hussain, Khwaja Fareed University, RYK 19

Safdar Hussain, Khwaja Fareed University, RYK 20

Safdar Hussain, Khwaja Fareed University, RYK 21

Safdar Hussain, Khwaja Fareed University, RYK 22

Safdar Hussain, Khwaja Fareed University, RYK 23

Safdar Hussain, Khwaja Fareed University, RYK 24

You might also like