Compiler Design
Chapter 6: Intermediate Language
Instructor: Fikru T. (MSc.)
Email: fikrutafesse08@gmail.com
1
Introduction
• The task of compiler is to convert the source program into machine program.
• This activity is done directly ,but it is not always possible to generate such a
machine code directly in one pass.
• Then, typically compilers generate as easy to represent form of source language
called intermediate language.
• The generation of an intermediate language leads to efficient code generation.
• Intermediate code is used to translate the source code into the machine code.
• Intermediate code lies between the high-level language and the machine language.
Lexical Intermediate Code
Parser Static Intermediate
Analyze Rest of compiler
Checker Code Generator
r
Figure : Position of intermediate code generator 2
Cont'd ...
• If the compiler directly translates source code into the machine code without generating
intermediate code then a full native compiler is required for each new machine.
• The intermediate code keeps the analysis portion same for all the compilers that's why it
doesn't need a full compiler for every unique machine.
• Intermediate code generator receives input from its predecessor phase and semantic
analyzer phase.
• It takes input in the form of an annotated syntax tree.
• Using the intermediate code, the second phase of the compiler synthesis phase is
changed according to the target machine.
3
Cont'd ...
Benefits of Intermediate Code Generation
• There are certain benefits of generating machine independent intermediate code.
1. A compiler for different machines can be created by attaching different back end
to existing front end of each machine.
2. A compiler for different source language (on the same machine) can be created
by proving different front ends for corresponding source languages to existing
back end.
3. A machine independent code optimizer can be applied to intermediate code in
order to optimize code generation.
4
Intermediate Languages
• There are three types of intermediate code representations.
1. Syntax Tree
2. Posix Notation
3. Three address code
• These three representations are used to represent the intermediate languages.
1. Syntax Tree
• The natural hierarchical structure is represented by syntax trees.
• Directed Acyclic Graph(DAG) is very much similar to syntax trees but they are in more
compact form.
• The code being generated as intermediate should be such that the remaining processing of
the subsequent phases should be easy. 5
Cont'd ...
• Example: consider the input string x := -a * b + -a * b for the syntax tree and DAG
representation. := :=
x + x +
• Solution:
* *
b *
Uminus b
Uminus Uminus b
a a
a
Parsee Tree DAG
2. Posix Notation
• The Posix notation is using postfix representation .
• Consider that the input expression is x := - a * b + - a * b then the required posix form
is: x a – b * a – b * + := 6
Cont'd ...
• Basically, the linearization of syntax tree is posix notation.
• In this representation the operator can be easily associated with the corresponding
operands.
• This is the most natural way of representation in expressions evaluation.
3. Three address code (TAC)
• It is used by an optimizing compilers.
• In three address code, the given expression is broken down into several separate
instructions.
• These instructions can be easily translate into Assembly Language.
• Each three address code instruction has at most three operands.
• It is a combination of assignment and a binary operator.
7
Cont'd ...
• In TAC, there is at most one operator on the right side of an instruction.
• Example1: x+y*z
t1=y*z
• t1 and t2 are compiler generated temporary names.
t2=x+t1
• Example2: x= a+a*(b-c)+d+(b-c)
t1 = b-c
t2 = a* t1
t3 = a+ t2
t4 = d+ t1
t5 = t3 + t4
x=t5
8
Cont'd ...
• Three address code can be classified into two: Quadruples and Triples.
1. Quadruples: has four fields such as operator, source1, source2, result(destination).
• Example: a := -b * c + d
TAC: t1 = -b
t2 = c + d
t 3 = t1 * t2
a := t3
2. Triples : has three fields such as operator, source1 and source2.
Example: a := -b * c + d
TAC: t1 = -b
t2 = c + d
t3 = t1 * t2
9
Declarations
• When we encounter declarations, we need to lay out storage for the declared variables.
• For every local name in a procedure, we create a ST(Symbol Table) entry containing:
• The type of the name
D → integer, id
1. How much storage the name requires D → real, id
2. The production: D → D1, id
• A suitable transition scheme for declarations would be:
• ENTER is used to make the entry into
symbol table.
• ATTR is used to trace the data type.
End!!!
10