You are on page 1of 49

Compiler Design (KCS-502)

3rd year (Semester – V)


Session – 2023 - 24
Unit – III
Part - A
Ratish Srivastava
Asst. Prof.
CSE Dept.
UCER, Prayagraj
Compiler Design, KCS-502 1
Intermediate Code
• In many compilers the source code is translated
into a language which is intermediate in
complexity between a high-level programming
language and machine code. Such a language is
therefore called ‘intermediate code’ or
‘intermediate text’.

• It is possible to translate directly from source to


machine or assembly language in a syntax-
directed way but doing so makes generation of
optimal code a difficult task.
Compiler Design, KCS-502 2
Intermediate Code
• The reason, efficient machine or assembly language is
hard to generate, is that one is immediately forced to
choose a particular register to hold the result of each
computation.
• Therefore one usually chooses for intermediate text a
notation in which each statement involves at most one
arithmetic operation or one test.
• The usual intermediate text introduces symbols to
stand for various temporary quantities such as the
value of B*C in the source language expression A+B*C.

Compiler Design, KCS-502 3


Intermediate Code
• Types of intermediate code often used in
compilers are:
– Postfix notation
– Syntax trees
– Three address code
(representations for implementing Three Address Code are-)
oQuadruples
oTriples
oIndirect triples
Compiler Design, KCS-502 4
Postfix Notation
• In general, if e1 and e2 are any postfix expressions and
θ is any binary operator, the result of applying θ to the
values denoted by e1 and e2 is indicated in postfix
notation by e1e2θ.
• No parentheses are needed in postfix notation because
the position and arity (number of arguments) of the
operators permits only one way to decode a postfix
expression.
• If k-ary opeartor θ is applied to postfix expressions
e1, e2, ….., ek, the result is denoted by e1e2….ekθ.

Compiler Design, KCS-502 5


Postfix Notation
• Let us introduce a useful 3-ary (ternary) operator,
the conditional expression.
• If e then x else y denote the expression whose
value is x if eǂ0 and y if e=0.
• Using ? as a ternary postfix operator we can
represent this expression as exy?
• The postfix form of the expression:
If a then if c-d then a+c else a*c else a+b is
acd-ac+ac*?ab+?

Compiler Design, KCS-502 6


Postfix Notation
• Note:
– One language that normally uses a postfix
intermediate language is SNOBOL. In fact, SNOBOL
is often interpreted rather than compiled. The
output of the SNOBOL compiler is the intermediate
code itself which is passed to an interpreter, which
reads the intermediate code and executes it.

Compiler Design, KCS-502 7


Postfix Notation
• Evaluation of Postfix Expressions:
– We can evaluate the postfix expression easily using a stack,
either a hardware stack or one implemented in software.
– The general strategy is to scan the postfix code left to right.
– We push each operand onto the stack.
– If we encounter a k-ary operator, its first (left most)
argument will be k-1 positions below the top on the stack.
– It is then easy to apply the operator to the top k values on
the stack.
– These values are popped and the result of applying the
k-ary operator is pushed onto the stack.
Compiler Design, KCS-502 8
Postfix Notation
• Control Flow in Postfix Code:
– While postfix notation is useful for intermediate code if the
language is mostly expressions, but if operands are
undefined or have side effects the postfix implementation
not only would be inefficient, but might be incorrect.
– One solution is to introduce labels and conditional and
unconditional jumps into the postfix code.
• The postfix code, then be stored in a one dimensional array, with
each word of the array being either an operator or operand
• Operands are represented by pointers to the symbol table and
operators by integer codes.
• To distinguish operators from operands we use negative integers
for operator codes.
• In this implementation, a label is just an index into the array
holding the code.
Compiler Design, KCS-502 9
Postfix Notation
• Syntax-Directed Translation to Postfix Code:
– The production of postfix intermediate code for expressions is simple.
– It is described by the syntax-directed translation scheme as follows:
Production Semantic Action
E → E(1) op E(2) E.CODE := E(1).CODE || E(2).CODE || ‘op’
E → (E(1)) E.CODE := E(1).CODE
E → id E.CODE := id
• Here E.CODE is a string valued translation.
• The value of the translation E.CODE for the first production is the
concatenation of the two translations E(1).CODE and E(2).CODE and the
symbol ‘op’ which stands for any operator symbol.
• In the second rule, the translation of a parenthesized expression is the
same as that for the unparenthesized expression.
• The third rule tells us that the translation of any identifier is the identifier
itself.

Compiler Design, KCS-502 10


Postfix Notation
– The semantic actions in this translation scheme have a
particularly simple form.
– The translation of the non-terminal on the left of each
production is the concatenation of the translations of
the non-terminals on the right in the same order as in
the production, followed by some additional string.
– Such a translation scheme is called ‘Simple Postfix’
and it can be implemented without a translation stack
just by emitting the output string after each
reduction.

Compiler Design, KCS-502 11


Parse Tree and Syntax Trees
• The parse tree is a useful intermediate language
representation for a source program especially in
optimizing compilers where the intermediate code
needs to be extensively restructured.
• A parse tree, however, often contains redundant
information which can be eliminated, thus producing a
more economical representation of the source
program.
• One such variant of a parse tree is what is called an
(abstract) ‘syntax tree’, a tree in which each leaf
represents an operand and each interior node an
operator.
Compiler Design, KCS-502 12
Parse Tree and Syntax Trees
• For example, the syntax tree for the expression
a*(b+c)/d is

and the syntax tree for statement


if a=b then
a:=c+d
else
b:=c-d is
Compiler Design, KCS-502 13
Parse Tree and Syntax Trees
• So, the syntax tree is nothing more than a condensed form of the
parse tree.
• The operator and keyword nodes of the parse tree are moved to
their parent and a chain of single productions is replaced by single
link.

Parse tree for the string id+id*id Syntax tree for id+id*id
Compiler Design, KCS-502 14
Parse Tree and Syntax Trees
Example:
Construct syntax tree and postfix notation for
the following expression:
(a+(b*c))↑d

Compiler Design, KCS-502 15


Parse Tree and Syntax Trees
Solution:
Postfix notation
(a+(b*c))↑d
(a+X)↑d Put X=b*c so bc*
Y↑d Put Y=a+X so aX+
Z Put Z= Y↑d so Yd↑
Now backward substituting the values of temporary
variables, we get,
Z = Yd↑
= aX+d↑
= abc*+d↑
So, the postfix expression is abc*+d↑
Compiler Design, KCS-502 16
Parse Tree and Syntax Trees
• Syntax-Directed Construction of Syntax trees:
– Like postfix code, it is easy to define either a parse tree or a
syntax tree in terms of a syntax-directed translation scheme.
– Syntax-directed translation scheme to construct syntax trees are
as follows:
Production Semantic Action
E → E(1) op E(2) {E.VAL := NODE(op, E(1).VAL, E(2).VAL}
E → (E(1)) {E.VAL := E(1).VAL}
E → -E(1) {E.VAL := UNARY(-, E(1).VAL)}
E → id {E.VAL := LEAF(id)}
• E.VAL is a translation whose value is a pointer to a node in the syntax
tree.

Compiler Design, KCS-502 17


Parse Tree and Syntax Trees
• The function NODE(OP, LEFT, RIGHT) takes 3 arguments; the
first is the name of the operator, the second and third are
pointer to roots of sub trees.
– The function creates a new node labelled by the first argument
and makes the second and third arguments the left and right
children of the new node, returning a pointer to the created
node.

• The function UNARY(OP, CHILD) creates a new node labelled


OP and makes CHILD its child and a pointer to the created
node is returned.
• The function LEAF(ID) creates a new node labelled by ID and
returns a pointer to the node. This node receives no
children.

Compiler Design, KCS-502 18


Three-Address Code,
Quadruples and Triples
• The final category of intermediate code is known as ‘three-
address code’.
• This intermediate code is preferred in many compilers,
especially those doing extensive code optimization because
it allows the intermediate code to be rearranged in a
convenient manner.
• Three-Address Code:
– Three-address code is a sequence of statements of the general
form
x:=y op z
where x, y and z are names, constants or compiler-generated
temporary names, op stands for any operator such as fixed or
floating point arithmetic operator or logical operator on boolean
value data.
Compiler Design, KCS-502 19
Three-Address Code,
Quadruples and Triples
– There is only one operator on the right side of
statement.
– Thus a source language expression like x+y*z might be
translated into a sequence
t1:=y*z
t2:=x+t1
where t1 and t2 are compiler-generated temporary
names.
– The reason for the term “three-address code” is that
each statement usually contains 3 addresses, two for
the operands and one for the result.

Compiler Design, KCS-502 20


Three-Address Code,
Quadruples and Triples
• Types of Three-Address Statements:
1) Assignment statements of the form x := y op z where ‘op’
is a binary arithmetic or logical operation.
2) Assignment instructions of the form x := op y, where op
is a unary operation. Essential unary operations include
unary minus, logical negation, shift operators and
conversion operators that convert a fixed-point number
to a floating-point number.
3) Copy statements of the form x := y where the value of y
is assigned to x.
4) The unconditional jump goto L. This means the three-
address statement with label L is the next to be
executed.

Compiler Design, KCS-502 21


Three-Address Code,
Quadruples and Triples
5) Conditional jumps such as if x relop y goto L. This
instruction applies a relational operator (<, =, >=
etc.) to x and y, and executes the statement with
label L next if x stands in relation relop to y. If not,
the three-address statement following if A relop B
goto L is executed next as in the usual sequence.
Note: relop means relational operator
6) Indexed assignments of the form x := y[i] and
x[i] := y.
7) Address and pointer assignments of the form
x := &y, x := *y and *x := y.

Compiler Design, KCS-502 22


Three-Address Code,
Quadruples and Triples
8) param x and call p, n for procedure calls and return y,
where y representing a returned value is optional. Their
typical use is as the sequence of three-address
statements:
param x1
param x2
…………….
param xn
call p, n
generated as part of call of the procedure p(x1, x2, ….. , xn).
The integer ‘n’ indicating the number of actual parameters
in “call p, n” is not redundant because calls can be nested.

Compiler Design, KCS-502 23


Three-Address Code,
Quadruples and Triples
• Implementations of Three-Address
Statements:
– A three-address statement is an abstract form of
intermediate code.
– In a compiler, these statements can be
implemented as records with fields for the
operator and the operands.
– Three such representations are quadruples,
triples and indirect triples.

Compiler Design, KCS-502 24


Three-Address Code,
Quadruples and Triples
• Quadruples:
– A quadruple is a record structure with 4 fields, which
we call op, arg1, arg2 and result.
– The op field contains an internal code for the
operator.
– The three-address statement x := y op z is represented
by placing y in arg1, z in arg2 and x in result.
– Statement with unary operators like x := -y or x := y do
not use arg2.
– Operators like param use neither arg2 nor result.

Compiler Design, KCS-502 25


Three-Address Code,
Quadruples and Triples
– Conditional and unconditional jumps put the
target label in result.
– For example, an assignment statement like
A := -B * (C + D) would be translated to three-
address statements, like as
T1 := -B
T2 := C + D
T3 := T1 * T2
A := T3

Compiler Design, KCS-502 26


Three-Address Code,
Quadruples and Triples
– These statements are represented by quadruples as:
Location op arg1 arg2 Result
(0) uminus B - T1
(1) + C D T2
(2) * T1 T2 T3
(3) := T3 - A

– The contents of field arg1, arg2, and result are


normally pointers to the symbol-table entries for the
names represented by these fields.
– If so, temporary names must be entered into the
symbol table as they are created.

Compiler Design, KCS-502 27


Three-Address Code,
Quadruples and Triples
• Triples:
– To avoid entering temporary names into the symbol table,
we might refer to temporary value by the position of the
statement it computes.
– If we do so, three-address statements can be represented
by records with only three fields: op, arg1 and arg2.
– The fields arg1 and arg2, for the arguments of op, are
either pointers to the symbol table (for programmer
defined names or constants) or pointers into the triple
structure (for temporary values).
– Since three fields are used, this intermediate code format
is known as ‘triples’.

Compiler Design, KCS-502 28


Three-Address Code,
Quadruples and Triples

Triple representation of three-address statement


Location op arg1 arg2
(0) uminus B -
(1) + C D
(2) * (0) (1)
(3) := A (2)

Compiler Design, KCS-502 29


Three-Address Code,
Quadruples and Triples
– A ternary operation like A[I] := B, actually requires two
entries in the triple structure
Location op arg1 arg2
(0) [ ]= A I
(1) - B -

while A := B[I] is naturally represented as


Location op arg1 arg2
(0) =[ ] B I
(1) := (0) A

Compiler Design, KCS-502 30


Three-Address Code,
Quadruples and Triples
• Indirect Triples:
– Another implementation of three-address code
which has been considered is that of listing
pointers to triples, rather than listing the triples
themselves.

– This implementation is naturally called “Indirect


Triples”.

Compiler Design, KCS-502 31


Three-Address Code,
Quadruples and Triples
Example 1:
Let us use an array STATEMENT to list pointers to
triples in desired order.
Then the three-address statements of
T1 := -B
T2 := C + D
T3 := T1 * T2
A := T3
might be represented as
Compiler Design, KCS-502 32
Three-Address Code,
Quadruples and Triples
Location Statement Location op arg1 arg2
(0) (14) (14) uminus B -
(1) (15) (15) + C D
(2) (16) (16) * (14) (15)
(3) (17) (17) := A (16)

So, in the indirect triple representation the


listing of triples is been done and listing
pointers are used instead of using statements.

Compiler Design, KCS-502 33


Three-Address Code,
Quadruples and Triples
Example 2:
Translate the following expression to
quadruple, triple and indirect triple:
(x + y) * (y + z) + (x + y + z)

Compiler Design, KCS-502 34


Three-Address Code,
Quadruples and Triples
Solution:
Given expression is (x + y) * (y + z) + (x + y + z)
The three address code is
t1 := x + y
t2 := y + z
t3 := t1 * t2
t4 := t1 + z
t5 := t3 + t4

Compiler Design, KCS-502 35


Three-Address Code,
Quadruples and Triples
Quadruple:
Location op arg1 arg2 Result
(0) + x y t1
(1) + y z t2
(2) * t1 t2 t3
(3) + t1 z t4
(4) + t3 t4 t5

Compiler Design, KCS-502 36


Three-Address Code,
Quadruples and Triples
Triple:
Location op arg1 arg2
(0) + x y
(1) + y z
(2) * (0) (1)
(3) + (0) z
(4) + (2) (3)

Compiler Design, KCS-502 37


Three-Address Code,
Quadruples and Triples
Indirect Triple:
Location Statement Location op arg1 arg2
(0) (11) (11) + x y
(1) (12) (12) + y z
(2) (13) (13) * (11) (12)
(3) (14) (14) + (11) z
(4) (15) (15) + (13) (14)

Compiler Design, KCS-502 38


Three-Address Code,
Quadruples and Triples
Example 3:
Translate the following expression into
quadruple, triple and indirect triple:
-(a + b) * (c + d) – (a + b + c)

Compiler Design, KCS-502 39


Three-Address Code,
Quadruples and Triples
Solution:
Given expression is -(a + b) * (c + d) – (a + b + c)
The three-address code is
t1 := a + b
t2 := -t1
t3 := c + d
t4 := t2 * t3
t5 := t1 + c
t6 := t4 - t5

Compiler Design, KCS-502 40


Three-Address Code,
Quadruples and Triples
Quadruple:
Location op arg1 arg2 Result
(0) + a b t1
(1) uminus t1 t2
(2) + c d t3
(3) * t2 t3 t4
(4) + t1 c t5
(5) - t4 t5 t6

Compiler Design, KCS-502 41


Three-Address Code,
Quadruples and Triples
Triple:
Location op arg1 arg2
(0) + a b
(1) uminus (0)
(2) + c d
(3) * (1) (2)
(4) + (0) c
(5) - (3) (4)

Compiler Design, KCS-502 42


Three-Address Code,
Quadruples and Triples
Indirect Triple:
Location Statement Location Op Arg1 arg2
(0) (10) (10) + a b
(1) (11) (11) uminus (10)
(2) (12) (12) + c d
(3) (13) (13) * (11) (12)
(4) (14) (14) + (10) c
(5) (15) (15) - (13) (14)

Compiler Design, KCS-502 43


Three-Address Code,
Quadruples and Triples
Example 4:
Give the sequence of three-address code
instructions corresponding to each if the
arithmetic expressions: x = 2 + 3 + 4 + 5

Compiler Design, KCS-502 44


Three-Address Code,
Quadruples and Triples
Solution:
The three address code for the above
sequence is as follows:
t1 := 2
t2 := t1 + 3
t1 := t2 + 4
t4 := t3 + 5
x := t4

Compiler Design, KCS-502 45


Comparison of Representation:
The Use of Indirection
• The difference between triples and quadruples
may be regarded as a matter of how much
indirection is present in the representation.
– When we ultimately produce target code, each name,
temporary or programmer defined, will be assigned
some run-time memory location.
– This location will be placed in the symbol-table entry
for the datum. Using the quadruple notation, a three
address statement defining or using a temporary can
immediately access the location for that temporary
via the symbol table.
Compiler Design, KCS-502 46
Comparison of Representation:
The Use of Indirection
• A more important benefit of quadruples appears
in an optimizing compiler, where statements are
often moved around.
– Using the quadruple, if we move a statement
computing x, the statements using x requires no
change.
– However, in the triples notation, moving a statement
that defines a temporary value requires us to change
all references to that statement in the arg1 and arg2
arrays. This problem makes triples difficult to use in an
optimizing compilers.

Compiler Design, KCS-502 47


Comparison of Representation:
The Use of Indirection
• A statement can be moved by recording the
statement list. Since pointers to temporary values
refer to the op-arg1-arg2 arrays, which are not
changed, none of those pointers need to be
changed.
– Thus, indirect triples look very much like quadruples
as far as their utility is concerned.
– Indirect triples can save some space compared with
quadruples, if some temporary value is used more
than once. The reason is that two or more entries in
the statement array can point to the same line of the
op-arg1-arg2 structure.

Compiler Design, KCS-502 48


Comparison of Representation:
The Use of Indirection
• Both triples and quadruples waste some
space, since fields will occasionally be empty.
If space is important, one can use a single
array and store either triple or quadruples
consecutively.
– The advantage of this representation is seen if we
try to examine the statements in reverse order.

Compiler Design, KCS-502 49

You might also like