Professional Documents
Culture Documents
A SIMPLE SYNTAX-
DIRECTED TRANSLATOR
Chapter 2
2
Syntax Definition
• Context-free grammar is a 4-tuple with
– A set of tokens (terminal symbols)
– A set of nonterminals
– A set of productions
– A designated start symbol
4
Example Grammar
Derivation
• Given a CF grammar we can determine the
set of all strings (sequences of tokens)
generated by the grammar using derivation
– We begin with the start symbol
– In each step, we replace one nonterminal in the
current sentential form with one of the right-
hand sides of a production for that nonterminal
6
list
list + digit
list - digit + digit
digit - digit + digit
9 - digit + digit
9 - 5 + digit
9-5+2
Parse Trees
• The root of the tree is labeled by the start symbol
• Each leaf of the tree is labeled by a terminal
(=token) or
• Each interior node is labeled by a nonterminal
• If A X1 X2 … Xn is a production, then node A has
immediate children X1, X2, …, Xn where Xi is a
(non)terminal or ( denotes the empty string)
9
list
list digit
list digit
digit
The sequence of
9 - 5 + 2 leafs is called the
yield of the parse tree
The Two Derivations for x – 2 * y
Rule Sentential Form Rule Sentential Form
— Expr — Expr
1 Expr Op Expr 1 Expr Op Expr
3 <id,x> Op Expr 3 Expr Op <id,y>
5 <id,x> – Expr 6 Expr * <id,y>
1 <id,x> – Expr Op Expr 1 Expr Op Expr * <id,y>
2 <id,x> – <num,2> Op Expr 2 Expr Op <num,2> * <id,y>
6 <id,x> – <num,2> * Expr 5 Expr – <num,2> * <id,y>
3 <id,x> – <num,2> * <id,y> 3 <id,x> – <num,2> * <id,y>
This evaluates as x – ( 2 * y ) 2 y
*
Derivations and Parse Trees
G
Rightmost derivation
Rule Sentential Form
— Expr E
1 Expr Op Expr
3 Expr Op <id,y>
6 Expr * <id,y> E Op E
1 Expr Op Expr * <id,y>
2 Expr Op <num,2> * <id,y>
5 Expr – <num,2> * <id,y> E Op E * y
3 <id,x> – <num,2> * <id,y>
x – 2
This evaluates as ( x – 2 ) * y
13
Ambiguity
Ambiguity (cont’d)
string string
9 - 5 + 2 9 - 5 + 2
15
Associativity of Operators
Left-associative operators have left-recursive productions
left left + term | term
String a+b+c has the same meaning as (a+b)+c
Right-associative operators have right-recursive productions
right term = right | term
String a=b=c has the same meaning as a=(b=c)
Operators on the same line have the same associativity and
precedence:
left-associative: + -
left-associative: */
16
Precedence of Operators
Operators with higher precedence “bind more tightly”
expr expr + term | expr – term | term
term term * factor | term / factor | factor
factor number | ( expr )
number 0|1|2|…….|9
String 2+3*5 has the same meaning as 2+(3*5)
expr
expr term
term term factor
factor factor number
number number
2 + 3 * 5
17
Syntax of Statements
stmt id := expr
| if expr then stmt
| if expr then stmt else stmt
| while expr do stmt
| begin opt_stmts end
opt_stmts stmt ; opt_stmts
|
18
Syntax-Directed Translation
• Uses a CF grammar to specify the syntactic
structure of the language
• AND associates a set of attributes with the
terminals and nonterminals of the grammar
• AND associates with each production a set of
semantic rules to compute values of attributes
• A parse tree is traversed and semantic rules
applied: after the tree traversal(s) are completed,
the attribute values on the nonterminals contain
the translated form of the input
19
expr.t = “95-2+”
term.t = “9”
9 - 5 + 2
22
Depth-First Traversals
procedure visit(n : node);
begin
for each child m of n, from left to right do
visit(m);
evaluate semantic rules at node n
end
23
expr.t = “95-2+”
term.t = “9”
Translation Schemes
• A translation scheme is a CF grammar embedded
with semantic actions
• When drawing a parse tree for a translation scheme,
we indicate an action by constructing an extra child
for it, connected by a dashed line to the node that
corresponds to the head of the production.
expr
{ print(“+”) }
expr + term
{ print(“2”) }
{ print(“-”) }
expr - term 2
{ print(“5”) }
term 5
{ print(“9”) }
9
Translates 9-5+2 into postfix 95-2+
27
2.4 Parsing
• Parsing = process of determining if a string of tokens can
be generated by a grammar.
• Input program which is a string of characters, but
tokenized into tokes.
• For any CF grammar there is a parser that takes at most
O(n3) time to parse a string of n tokens.
– 3-Level nested loop (Why?
– (Why N3)
• Top-down parsing “constructs” a parse tree from root
(start symbol) to leaves.
• Bottom-up parsing “constructs” a parse tree from leaves
to root.
28
Parser as a program
• Parser is an executable program that takes input string
• Then checks if parse tree can be constructed from this string.
• Actual construction of parse tree as data structure is not important or required
at this chapter. We will construct actual data structure tree in next chapters.
• Or software just checks if a parse tree can be constructed.
• For this purpose, Parser contains such string processing routines that;
• Check if a substring (starting from ith character to jth character is
correctly identified as Body of a Single Production.
• Such body may be recursive, so such procedures are mostly
recursive.
29
Recursive routines
• Such body may be recursive, so such procedures are
also recursive.
• Bool Exp( ){
– if base_case then return true;//if expression was correct and
completely divided into subexpressions and terms etc;
– else
– Exp( ); match(+); Term( );
– If there was error at any stage, return false else return true
• }
• If starting recursion returns successfully, it means that
program is parsed and it was correct.
30
Parser(syntax) error
• Parser generates several errors, but;
– Most common error is encountered when a substring
does not correspond to any Non-terminal of that
language.
– In other words, pattern of a substring is such that
parser cannot understand it and correspond it with
some well-defined.
– Programming construct (if-else, any loop,
declaration, compound statement, expression,
procedure call-return, array indexes, pointer
operation, statement structure, nesting etc are
common constructs of C++ language)
31
Multiple errors
• Interestingly, parser does not stop on very
first parse-error.
• C++ (and most other languages) have very
well-defined structured statements.
• Even if a statement in syntatically incorrect,
parser can find end of statement or start of
next statement.
• Parser will restart from next statement.
32
Parser routines
• Parsing now means writing procedures which handle parsing of
input string and (logically or actually) generating parse tree. (or
verifying that such tree can be constructed).
• Normally, there are more than such subroutines, each relevant to
one non-terminal.
• Such subroutines detect some pattern of programming construct and
decide which Non-Terminal is to relevant to this pattern.
• Then they can make children nodes of that Non-terminal and attach
them with it.
• A Bottom-up parser will first detect patterns from input string and
call relevant Non-terminal subroutine.
• A Top-Down parser will a start from a starting Non-terminal
Symbol and call subroutine related to this stat symbol. Then in its body,
it will start scanning string from left and detect other Non-terminal
related symbols, then call subroutines related to those Non-terminals.
34
Top-Down parsing
• Start from production of stat symbol.
– (normally there is unique start symbol, and its
production is also unique or can be separated)
– i.e Call subroutine relevant to start symbol.
– Stmts, exp, main etc are start symbols, as they
represent whole program/expression/set of
statements.
– Such subroutine will check symbol in program
body, and match with Non-start Non-terminals
or terminals.
35
Predictive Parsing
• Recursive descent parsing is a top-down parsing method
– one (recursive) procedure is designed to detect Each
nonterminal. This procedure is responsible for parsing the
nonterminal’s syntactic category of input tokens
– When a nonterminal has multiple productions, each production
is implemented in a branch of a selection statement based on
input look-ahead information
• Predictive parsing is a special form of recursive descent
parsing where we use one lookahead token to
unambiguously determine the parse operations.
– Starting symbol of each statement correctly predicts the next
incoming symbols in that statement.
37
type simple
| ^ id
| array [ simple ] of type
simple integer
| char
| num dotdot num
38
Check lookahead
type()
and call match
match(‘array’)
lookahead
39
match(‘array’) match(‘[’)
lookahead
40
match(‘num’)
lookahead
41
match(‘num’) match(‘dotdot’)
lookahead
42
lookahead
43
lookahead
44
lookahead
45
match(‘integer’)
Input: array [ num dotdot num ] of integer
lookahead
46
{x=3;{y=4;};}
lookahead
47
Bool L( ){
Tokenlookahead t=NextTokenFromLexer( );
if(t==n or t==‘{‘){
E( );Match(‘;’);L( ); return true
}
Else
Return true;
}
49
Pre-Mid Assignment
(individual):
– Construct a lexical analyzer/tokenizer and
submit in my room (viva will taken).
– Those who cannot do it themselves, kindly do
not bother so submit any other person’s or
copes from internet. It is better not to submit
than being F due to cheating.
51
Lexical analyzer
y := 31 + 28*x
lexan()
<id, “y”> <assign, > <num, 31> <‘+’, > <num, 28> <‘*’, > <id, “x”>
token
(lookahead)
tokenval Parser
(token attribute) parse()