Construction of Syntax Trees

Construction of Syntax Trees:
The use of syntax trees as an intermediate representation allows

translation to be decoupled from parsing.
Translation routines that are invoked during parsing must live with two
kinds of restrictions.
1. A grammar that is suitable for parsing may not reflect the natural
hierarchical structure of the constructs in the language.
Example: a grammar for Fortran may view a subroutine as consisting

simply of a list of statements. However, analysis of the subroutines may
be easier if we use a tree representation that reflects the nesting of DO
loops
2. The parsing methods constrains the order in which nodes in a parse

tree are considered. This order may not match the order in which
information about a construct becomes available. For this reason,
compilers for C usually construct syntax trees for declaration.
Syntax trees:
The production S  if B then S1 else S2 might appear in a syntax tree as
if - then – else
B S1 S2
In a syntax tree, operators and keywords do not appear as leaves, but rather
are associated with the interior node that would be the parent of those leaves
in the parse tree.
Also the parse tree for the expression 3 * 5 + 4 shown below
* 4
3 5
Constructing Syntax Trees for expressions:
Each node in a syntax tree can be implemented as a record
with several fields.
In the node for an operator, one field identifies the operator

and the remaining fields contain pointers to the nodes for the
operands.
When used for translation, the nodes in in a syntax tree may

have additional fields to holds the values ( or pointers to
value) of attributes attached to the node. We use some
functions to create nodes of syntax tree for expressions with
binary operator.
Each function returns a pointer to the newly created node.
1. mknode (op, left, right) creates an operator node with
label op and two fields containing pointers to left and
right.
2. mkleaf(id, entry) creates identifier node with label id and a

field containing entry, a pointer to the symbol table entry
for the identifier.
3. mkleaf( num, value) creates a numbr of nodes with label

num and a field containing val, the value of the number.
Example: the following sequence of functions calls creates the syntax tree for
the expression a – 4 + c
P1, p2, p3, p4, p5 are pointers to nodes, and entrya, entryc are pointers to
the symbol-table entries for identifiers a and c.
p1 := mkleaf (id, entrya);

p2 := mkleaf (num, 4);
p3 := mknode ( ‘-‘, p1,p2);
p4:= mkleaf (id, entryc);
p5:= mknode (‘+’, p3,p4);+
- id
To entry for c
id num 4
To entry for a
The tree constructed bottom – up.
The function calls
mkleaf (id, entrya) and

mkleaf (num, 4)
construct the leaves for a and 4;
The pointers to these nodes are saved using p1 and p2.
The call mknode (‘ – ‘, p1, p2) then constructs the interior

node with the leaves for a and 4 as children.
After two more steps, p5 is left pointing to the root.

Intermediate Code Generation:
Static Intermediate Intermediate Code

Parser
Checker Code generator Generator
Code
Intermediate Languages:
Syntax trees and Postfix notation, that we discussed before, are two kind of
intermediate representations; A third one, called three-address code.
The semantic rules for generating three-address code from common
programming language constructs are similar to those for constructing syntax
trees or for generating postfix notation.
Graphical Representations:
A syntax tree depicts the natural hierarchical structure of a source program.

A dag gives the same information but in a more compact way because common
subexpressions are identified. A syntax tree and dag for the assignment statement
a := b * -c + b * -c shown below:
assign assign
a + a +
* * *
b uminus b uminus b minus
c c c
Three – Address Code
The three address code is a sequence of statements of the general form:
X := Y op Z
Where X, Y, Z are names; op stands for any operator.
So a source language expression like X + Y * Z might be

translated into a sequence:
t1 := Y * Z
t2 := X + t1
Where t1 and t2 are compiler-generated temporary names.
Three address code is a linearized representation of a syntax tree or a dag in

which explicit names correspond to the interior nodes of the graph.
The syntax tree and dag above are represented by the following three address
code sequences:
t1 := -c t1 := -c
t2 := b * t1 t2 := b * t1
t3 := -c t3 := t2 + t2
t4 := b * t3 a := t5
t5 := t2 + t4
a := t5
1. Code for the syntax tree 2. Code for the dag
The reason for the term “three-address code” is that each statement usually
contains three addresses, two for the operands and one for the result.
In the implementations of three-address code, a programmer-defined name is

replaced by a pointer to a symbol-table entry for that name.
Types of three-address statements:
Assignment statements of the form X := Y op Z, where op is a binary arithmetic
or logical operation.
Assignment instructions of the form X := op Y, where op is a unary operation.
Copy statements of the form X := Y where the value of Y is assigned to X.
The unconditional jump goto L. The three-address statement with label L is the next
to be executed.
Conditional jumps such as if X relop Y goto L.
Param X and call p, n for procedure calls and return y, where y representing a
returned value is optional. Their typical use is as the sequence of three-address
statements:
Param x1
Param x2
Param x3
:
Param xn
Call p, n
Indexed assignments of the form X := Y[i] and X[i] := Y
Address and pointer assignments of the form X := &Y, X := *Y, and *X := Y
Implementation of Three-Address Statements:
A three-address statement is an abstract form of
intermediate code.
In a compiler, these statements can be

implemented as records with fields for the operator
and the operands.
There are three representations – Quadruples,

triples, and indirect triples:
Quadruples:
Is a record structure with four fields, op, arg1, arg2, and the result.
The op field contains an internal code for the operator.
Statements like X := Y op Z Y in arg1, Z in arg2, and X in result.
Statements like X := - Y or X := Y do not use arg2.
Operators like param use neither arg2 nor result.
Conditional & unconditional jumbs put the target label in result.
Below is the quadruples for the assignment a := b * -c + b * -c (using the above
three-address code):
op arg1 arg2 result
(0) uminus c t1
(1) * b t1 t2
(2) uminus c t3
(3) * b t3 t4
(4) + t2 t4 t5
(5) := t5 a
The contents of fields arg1, arg2, and result are normally pointers to the symbol-
table entries for the names represented by these field.
Triples:
op arg1 arg2
(0) uminus c
(1) * b (0)
(2) uminus c
(3) * b (2)
(4) + (1) (3)
(5) assign a (4)
To avoid entering temporary names into the symbol table, we might refer to a
temporary value by the position of the statement that computes it.
Three-address statements can be represented by records with only three fields:

op, arg1, and arg2.
The field arg1 and arg2, for the arguments of op, are either pointers to the
symbol table or pointers into the triple structure.
Since three fields are used, this intermediate code format is known as triples.
Parenthesized numbers represent pointers into the triple structure, while the
names themselves represent pointers to the symbol-table.
Indirect Triples:
Is a listing pointers to triples, rather than listing the triples themselves.
Statement
(0) (14)
(1) (15)
(2) (16)
(3) (17)
(4) (18)
(5) (19)
op arg1 arg2
(14) uminus c
(15) * b (14)
(16) uminus c
(17) * b (16)
(18) + (15) (17)
(19) assign a (18)

Construction of Syntax Trees

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Construction of Syntax Trees

Uploaded by

Copyright:

Available Formats

Construction of Syntax Trees:

The use of syntax trees as an intermediate representation allows

Example: a grammar for Fortran may view a subroutine as consisting

2. The parsing methods constrains the order in which nodes in a parse

In the node for an operator, one field identifies the operator

When used for translation, the nodes in in a syntax tree may

2. mkleaf(id, entry) creates identifier node with label id and a

3. mkleaf( num, value) creates a numbr of nodes with label

p1 := mkleaf (id, entrya);

The function calls

mkleaf (id, entrya) and

construct the leaves for a and 4;

The pointers to these nodes are saved using p1 and p2.

The call mknode (‘ – ‘, p1, p2) then constructs the interior

After two more steps, p5 is left pointing to the root.

Static Intermediate Intermediate Code

A syntax tree depicts the natural hierarchical structure of a source program.

b uminus b uminus b minus

The three address code is a sequence of statements of the general form:

Where X, Y, Z are names; op stands for any operator.

So a source language expression like X + Y * Z might be

Where t1 and t2 are compiler-generated temporary names.

Three address code is a linearized representation of a syntax tree or a dag in

1. Code for the syntax tree 2. Code for the dag

In the implementations of three-address code, a programmer-defined name is

Assignment instructions of the form X := op Y, where op is a unary operation.

Copy statements of the form X := Y where the value of Y is assigned to X.

Conditional jumps such as if X relop Y goto L.

In a compiler, these statements can be

There are three representations – Quadruples,

Three-address statements can be represented by records with only three fields:

You might also like