You are on page 1of 25

Lecture# 08

Compiler Construction
A Simple Syntax-Directed Translator,
(Part-IV)

by Safdar Hussain

Topics
• FIRST & FIRST Sets, Left-Recursion, Left Factoring
• Translator for Simple Expressions, Adding a Lexical Analyzer
• Token Attributes, Generic Instructions for Stack Manipulation
Overview
• This chapter contains introductory material
• Building a simple compiler
– Syntax Definition
– Syntax-Directed Translation
– Parsing
– A Translator for Simple Expressions
– The Lexical Analyzer

Safdar Hussain, Khwaja Fareed University, RYK 2


Top-down Parsing (Key Points)
• In general, the selection of a production for a nonterminal
may involve trial-and-error that is, we may have to try a
production and backtrack to try another production if the first
is found to be unsuitable.
• A production is unsuitable if, after using the production, we
cannot complete the tree to match the input string.
• A special case of parsing namely the predictive parsing does
not need backtracking.

Grammar

Safdar Hussain, Khwaja Fareed University, RYK 3


Predictive Parsing
• Recursive descent parsing is a top-down parsing
method
– Every nonterminal has one (recursive) procedure
responsible for parsing the nonterminal’s syntactic
category of input tokens
– When a nonterminal has multiple productions, each
production is implemented in a branch of a selection
statement based on input look-ahead information
• Predictive parsing is a special form of recursive
descent parsing where we use one lookahead token
to unambiguously determine the parse operations

Safdar Hussain, Khwaja Fareed University, RYK 4


FIRST
• FIRST() is the set of terminals that appear as the
first symbols of one or more strings generated from 
• If  is Є or can generate Є, then Є is also in FIRST()
type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
FIRST(simple) = { integer, char, num }
FIRST(^ id) = { ^ }
FIRST(type) = { integer, char, num, ^, array }

Safdar Hussain, Khwaja Fareed University, RYK 5


Using FIRST
We use FIRST to write a predictive parser as follows
procedure rest();
expr  term rest begin
rest  + term rest if lookahead in FIRST(+ term rest) then
match(„+‟); term(); rest()
| - term rest else if lookahead in FIRST(- term rest) then
| match(„-‟); term(); rest()
else return
end;

When a nonterminal A has two (or more) productions as in


A
|
Then FIRST () and FIRST() must be disjoint for
predictive parsing to work
Safdar Hussain, Khwaja Fareed University, RYK 6
Left Factoring
When more than one production for nonterminal A starts
with the same symbols, the FIRST sets are not disjoint

stmt  if expr then stmt


| if expr then stmt else stmt

We can use left factoring to fix the problem

stmt  if expr then stmt opt_else


opt_else  else stmt
|

Safdar Hussain, Khwaja Fareed University, RYK 7


Left Recursion
When a production for nonterminal A starts with a
self reference then a predictive parser loops forever
AA
|
|
We can eliminate left recursive productions by systematically
rewriting the grammar using right recursive productions

AR A   A’
|R OR |  A’
RR A’  A’
| |

Safdar Hussain, Khwaja Fareed University, RYK 8


A Translator for Simple Expressions
expr  expr + term { print(“+”) }
expr  expr - term { print(“-”) }
expr  term
term  0 { print(“0”) }
term  1 { print(“1”) }
… …
term  9 { print(“9”) }

After left recursion elimination:


expr  term rest
rest  + term { print(“+”) } rest | - term { print(“-”) } rest | 
term  0 { print(“0”) }
term  1 { print(“1”) }

term  9 { print(“9”) }

Safdar Hussain, Khwaja Fareed University, RYK 9


main()
Input: 1+2 { lookahead = getchar();
expr();
}
expr()
{ term();
while (1) /* optimized by inlining rest()
expr  term rest and removing recursive calls */
{ if (lookahead == „+‟)
{ match(„+‟); term(); putchar(„+‟);rest();
rest  + term { print(“+”) } rest }
| - term { print(“-”) } rest else if (lookahead == „-‟)
{ match(„-‟); term(); putchar(„-‟);rest();
| }
else break;
}
}
term()
term  0 { print(“0”) } { if (isdigit(lookahead))
term  1 { print(“1”) } { putchar(lookahead); match(lookahead);
}
… else error();
term  9 { print(“9”) } }
match(terminal t)
{ if (lookahead == t)
lookahead = getchar();
else error();
}
error()
{ printf(“Syntax error\n”);
exit(1); 10
}
Another way to Code-1

Safdar Hussain, Khwaja Fareed University, RYK 11


Another way to Code-2

Eliminating Tail recursion in the procedure rest (Previous page)

Safdar Hussain, Khwaja Fareed University, RYK 12


Adding a Lexical Analyzer
• Typical tasks of the lexical analyzer:
– Remove white space and comments
– Encode constants as tokens
– Recognize keywords
– Recognize identifiers

Safdar Hussain, Khwaja Fareed University, RYK 13


Adding a Lexical Analyzer Cont…
– Remove white space and comments

Skipping white space

Safdar Hussain, Khwaja Fareed University, RYK 14


The Lexical Analyzer
Lexical analyzer
y := 31 + 28*x
lexan()

<id, “y”> <assign, > <num, 31> <+, > <num, 28> <*, > <id, “x”>

token
tokenval Parser
(token attribute) parse()

Safdar Hussain, Khwaja Fareed University, RYK 15


Token Attributes
factor  ( expr )
| num { print(num.value) }

#define NUM 256 /* token returned by lexan */

factor()
{ if (lookahead == „(„)
{ match(„(„); expr(); match(„)‟);
}
else if (lookahead == NUM)
{ printf(“ %d “, tokenval); match(NUM);
}
else error();
}

Safdar Hussain, Khwaja Fareed University, RYK 16


Generic Instructions for Stack
Manipulation
push v push constant value v onto the stack
rvalue l push contents of data location l
lvalue l push address of data location l
pop discard value on top of the stack
:= the r-value on top is placed in the l-value below it
and both are popped
copy push a copy of the top value on the stack
+ add value on top with value below it
pop both and push result
- subtract value on top from value below it
pop both and push result
*, /, … ditto for other arithmetic operations
<, &, … ditto for relational and logical operations
Safdar Hussain, Khwaja Fareed University, RYK 17
Generic Control Flow Instructions
label l label instruction with l
goto l jump to instruction labeled l
gofalse l pop the top value, if zero then jump to l
gotrue l pop the top value, if nonzero then jump to l
halt stop execution
jsr l jump to subroutine labeled l, push return address
return pop return address and return to caller

Safdar Hussain, Khwaja Fareed University, RYK 18


L-value and R-value
What is the difference between left and right side identifier?
L-value Vs. R-value of an identifier
I:=5; L - Location
I:=I+1; R – Contents
The right side specifies an integer value, while left side
specifies where the value is to be stored.
Usually,
r-values are what we think as values
l-values are locations.

Safdar Hussain, Khwaja Fareed University, RYK 19


Stack manipulation
push v push v onto the stack
rvalue l push contents on data location l
lvalue l push address of data location l
pop throw away value on top of the stack
:= the r-value on top is placed in the l-value
below it and both are popped
copy push a copy of the top on the stack

Safdar Hussain, Khwaja Fareed University, RYK 20


Translation of Expressions
day = (1461*y) mod 4 + (153*m +2 ) mod 5 + d
lvalue day push 2
push 1461 + 2 day
rvalue y push 5 1 y
* mod 2 m
push 4 +
-3 d
mod rvalue d
...
push 153 +
rvalue m :=
*

Safdar Hussain, Khwaja Fareed University, RYK 21


day=2
y=1
Translation of Expressions (2)
m=2 Day = (1461*y) mod 4 + (153*m +2 ) mod 5 + d
d=-3 2 2 2 2 2 2 2
1461 1461 1461 1461 1 1
1 4 153

Safdar Hussain, Khwaja Fareed University, RYK 22


day=2
y=1
Translation of Expressions (3)
m=2 Day = (1461*y) mod 4 + (153*m +2 ) mod 5 + d
d=-3 2 2 2 2 2 2 2
1 1 1 1 1 1 4
153 306 306 308 308 3
2 2 5

Safdar Hussain, Khwaja Fareed University, RYK 23


day=2
y=1
Translation of Expressions (4)
m=2 Day = (1461*y) mod 4 + (153*m +2 ) mod 5 + d
d=-3 2 2
4 1

-3

Safdar Hussain, Khwaja Fareed University, RYK 24


The End

25

You might also like