You are on page 1of 10

MODULE-2

INTERMEDIATE CODE GENERATION

It is the last phase of the front end of a compiler. Here the front end translate a source program into an
Intermediate representation from which the back end generates target code.
Although a source program can be translated directly into the target language, some benefit of using a
machine independent intermediate forms are:
1. Intermediate code is closer to the target machine than the source language & hence easier to generate
code form.
2. Unlike machine language, intermediate code is (more or less) machine independent. This makes it
easier to retarget the compiler.
3. It allows a variety of option to be performed in a machine independent way.
4. Intermediate code generation can be implemented via syntax directed translation and thus can be
folded into passing by augmenting the code for the parser.
Intermediate code can take a number of forms. So different intermediate codes are:
1. Three-Address Code(TAC)
2. P-Code
3. Byte code
P-Code: It is a code in which a language specification intermediate code upon which the majority of
implementation of PASCAL are based.

Byte Code: Basically it is a machine depending codes like java and it is used by java Virtual Machine.

Three-Address Code(TAC): Three-Address code is a sequence of statements of the form,


x=y op z
where x, y and z may be variable names, constant or compiler generated variables (temporaries).
Since the statement involves no more than three references, it is called a Three-Address statement.
But , there are some TAC statements that involves lesser number of address.
For example, x=-y.
It is of the form x=op y, where op is a unary operation. Like Unary minus, shift operator, logical negation,
conversion operators.
Some common Three-Address-Code Instruction
1. Assignment instructions of the form x=y op z, where op is a binary arithmetic or logical operation.
Ex: a=b*c, c=d or e
2. Unary operations
The general form be x=op y, where op is a unary operation.
Ex:
x=-2
a=intoreal b
a=~b

1
3. Copy Statement: the general form be x=y
Here the statement x gets assigned the value of y
4. Unconditional jump: i.e. goto L
The three address instruction with label L is the next to be executed.
Ex:
100 : a:=b
101 : goto 104
……………….
103 : e:=f
104 : c:=d

5. Conditional jumps: i.e. if x relop y goto L


which apply a relational operator (<, ==, >) etc to x and y, and execute the instruction with label L
next if x stands in relation relop to y.
If not the three address instruction following if x relop y goto L is executed next, in sequence.
6. Indexed Assignment: Reading the value of an array elements with the statement i.e. x=y[i]
and writing a value into an array element in TAC with statement be x[i]=y.
7. Address and pointer Assignment: TAC can also be enriched with statements for address & pointer
assignment
x :=&y (Address assignment)
x :=*y
x :=y
Implementing Three-Address Instruction
There are three types for implementing three address instructions.
1. Quadruples
2. Triples
3. Indirect Triples
Quadruples: Quadruples are one of the ways in which the TAC statements can be implemented in a
compiler.
A quadruples has four fields. The fields are operator, argument 1, argument 2, and the request consider
a simple TAC assignment statement x:= y +z. It can be implemented as a quadruples as below.

Operator Argument-1 Argument-2 Result

+ Y z x

Some Exceptional Rule Of Quadruples


1. Instructions with unary operations like x=-y or x=y don’t use argument-2. For a copy statement like
x=y, op is =, while for most other operations, the assignment operator is implied.
2
2. Operators like power use neither in operator field or in argument field.
3. Conditional and unconditional Jumps put the target label in result.
Triples: To avoid entering temporary names into the symbol table one can allow the statement computing a
temporary value to represent the value. If we do so, three address statement are represented by a structure with
only three fields.
i.e. op, arg1, arg2( where arg1 & arg2 are the arguments of op) are either pointer to the symbol table or
pointer to the structure itself (for temporary values).
Since three fields are used, this intermediate code format is known as triples.

Operator Argument-1 Argument-2

Indirect Triples: If we use pointers to triples, rather than listing the triples themselves, the implementation is
called indirect triples implementation.
Op Arg-1 Arg-2
New index Index
10 (0)
20 (1)
30 (2)

Example: Write quadruples, triples & indirect triples for the following expression.
x[i]=y

Solution:- x[i]=y . So, the TAC will be


t1=x[i]
t2=y
t1=t2

Quadruples

Opn Arg-1 Arg-2 result Opn Arg-1 Arg-2


=[] x i t1
= y - t2 (0) =[] x i
= t2 - t1
(1) = y -

(2) = (1) (0)

3
Indirect triples

New index Index


10 (0)
20 (1)
30 (2)

Opn Arg-1 Arg-2


(0) =[] x i
(1) = Y -
(2) = (1) (0)

Type checking:
A source program should follow both the syntactic and semantics rules of the source language.
 some rules can be checked statistically during compile time and other rules can be checked
dynamically during runtime.
Static checking includes the syntax checks performed by the parser and semantic checks such
as type checks, flow of control checks, uniqueness checks and name related checks.
So, type checking will involve adding synthesized attribute through those parts of the language
grammar that involve expression & values.
The type checker plays role between syntactic analysis and intermediate code generation as
follows:

Token Stream Syntax Syntax tree


parser Type checker Intermediate code
From Lexical tree generator
Analyzer

Intermediate Representation

Types of type checking


There are two types of type checking.

1. Static type Checking


2. Dynamic type checking

4
Static type Checking :
Static type checking refers to the compiler checking of program to ensure that the syntactic &
semantic conventions of the source language are being followed.
i) Type checks: Operators & operands must have compatible types.
ii)Flow of control checks: Statements that cause flow of control to leave a construct must have some place
where control can be transferred.
Ex: Break statement in C
iii) Uniqueness checks: A language may dictate that in some contexts, an entity can be defined exactly once
that is identifies in declarations, case statement labels.
iv) Name-related checks: Some times the same name must be appear two or more times, that is in Ada a loop
or block can have a name that must appear at the beginning and at the end.
Dynamic type checking:
The rules that can be checked dynamically during runtime, is called dynamic type checking.

Static type Vs Dynamic type

5
Static type checking Dynamic type checking
1. If the compiler can verify at compile time, that
program is free from type error, then this type of
checking is called static type. 1. If a compiler after compile time can be verified
free from type error in runtime then this type
checking is called dynamic type checking

2. Every language in a grammar are static type


checking if it is acceptable by the grammar. But
statically typed language may discard the symbol
2. If the language of a grammar follow dynamic
table before execution time.
then we identify that the language require that
some or all of the symbol table be accessible at
run time.

EXAMPLE

+
< <

7 * 7 * 4 <

5 3 5 3 5 3

Fig.(a) Fig.(b) Fig.(c)

From fig. (a) the data type is not defined. So the type checker will check it.
Fig (b) describes that all the labels of the tree are correct
Fig (c) describes that here is an error because it never happen that an integer operates with an boolean
expression.
For this error ,we must Eliminate it.
There are two types of conversion( elimination)
1. Explicit type conversion
2. Implicit type convection

6
1. Explicit type conversion: If the program is required to write extra code to have this conversion
performed we call this as explicit type cast or conversion.
2. Implicit type convection: If the compiler performs this transformation without direction from the
programmer, the term is implicit type conversion.

Typically implicit type conversion is done only on the "build in" type.

Examples:

*
*
+ +

5 5
4 4 + +
< int

1 ……..>
1 2 <
one a
1 2
one a

[a] [b] [c] [d]

[Explicit type conversion] [Implicit type conversion]

Note:  indicates that a location is being dereferenced and turned into a value.

Runtime Environments
The compiler creates and manages a run-time environment which it assumes its target programs are
being executed.

This environment deals with a variety of issues such as the layout & allocation of storage location
for the object named.

7
The mechanisms used by the target program to access variables, the linkage between procedures, the
mechanisms for passing parameters & the interfaces to the operating system.

Storage organization
The executing target program runs in its own logical address space in which each program value has
a location.

The management and organization of this logical address space is shared between the compiler,
operating system & target machine.

The operating system maps the logical address into physical address, which are usually spread
through memory.

The run time representation of an object program in the logical address space consists of data &
program areas.

Code

Static

Heap

Free Memory

Stack

1. Code area: This contains the generated target code.

2. Static area: This contains data whose absolute address can be determined at compile time.

For example, In FORTRAN the address of all variables can be determined statically & therefore can be kept
in the static area.

Again In C, the global & static variables are kept in this area.

3. Stack area & heap area: To maximize the utilization of space at run time, the two areas stack & heap
are at the opposite ends of the remainder of the address space.

These areas are dynamic & their size can change as program executes. These areas grow towards
each other as needed.

8
The stack is used to store data structures called activation records that get generated during
procedure call.

Heap area is created during runtime & that includes objects pointed to by pointer types.

Static Versus Dynamic Storage Allocation

The two objectives static & dynamic storage allocation simply distinguish between compile time &
run time respectively.

A storage allocation is static if it can be made by the compile looking only at the text of the
program, not at what the program does when it executes.

A decision is dynamic if it can be decided only when the program is running.

1) Stack storage: Names local to a procedure are allocated space on a stack. The stack supports the normal
call/return policy for procedures.

2) Heap storage: Data that may out live the call to the procedure that created, it is usually allocated on a
heap of reusable storage.

The heap is an area of virtual memory that allows objects on other data elements to obtain storage
when they are created & to return that storage when they are invalided.

To support heap management 'garbage collection' enables the run time system to detect useless data
element & reuse their storage.

Activation Record
Defn: An activation record is a conceptual aggregate of data which contains all information required for a
single activation of a procedure.

Activation records get pushed into stack when a procedure is called & get popped when a procedure
return.

Activation records are held in the static area for language like FORTAN & in the stack area for
language like PASCAL. Actual parameters

Return value

The activation record contains different Control


parameters
link

Access link

Saved machine status


9
Local data

Temporaries
[Activation record table]

1. Temporaries: Temporary values use during expression evaluation.

2. local Data: These variables are a part of the local environment of the currently activated procedure.

3. Saved machine status: If the called procedure wants to use the register used by the calling procedures,
these have to be saved before & restored after the execution of the called procedures.

4. Access link: An access link may be needed to locate data needed by the call procedure but found else
where may be in another activation record. i.e. access link for access to non-local names.

5. Control link: It point to the activation record of the caller(calling procedure).

6. Returned value: This is the space for the return value of the called function, if any. Again not all called
procedures return a value and if one does, we may prefer to place that value in a register for efficiency.

7. Actual parameter: The actual parameters used by the calling procedure.

10

You might also like