You are on page 1of 24

Code Generation

1
Code Generation
 It is the final phase of a compiler. It takes as input an IR of the source program
with supplementary information in symbol table and produces as output an
equivalent target program.

 The source code written in a higher-level language is transformed into a lower-


level language that results in a lower-level object code, which should have the
following minimum properties:

 It should carry the exact meaning of the source code.

 It should be efficient in terms of CPU usage and memory management.

 It is used to produce the target code for three-address statements. It uses


registers to store the operands of the three address statement.

2
Code Generation
 Presented below can be used whether or not an
optimizing phase occurs before code generation.

.

3
Code Generation
 Three address code
 The given expression is broken down into several separate instructions. These
instructions can easily translate into machine language.
 Each Three address code instruction has at most three operands. It is a
combination of assignment and a binary operator.
Example
Given Expression:
a := (-c * b) + (-c * d)
The three address code can be represented in two
forms: quadruples and triples.

4
Code Generation
 Quadruples
 The quadruples have four fields to implement the three address code.
 The field of quadruples contains the name of the operator, the first source
operand, the second source operand and the result respectively.

Example:- a := -b * (c + d) t1 := -b, t2 := c + d, t3 := t1 * t2, a := t3


These statements are represented by quadruples as follows:

Operator Source 1 Source 2 Destination

1 uminus b - t1
2 + c d t2
3 * t1 t2 t3
4 := t3 - a 5
Code Generation
 Triples
 The quadruples have three fields to implement the three address code.
 The field of triples contains the name of the operator, the first source operand
and the second source operand.

Example:- a := -b * (c + d) t1 := -b, t2 := c + d, t3 := t1 * t2, a := t3


These statements are represented by triples as follows:

Operator Source 1 Source 2

1 uminus b -
2 + c d
3 * t1 t2
4 := t3 - 6
Code Generation
 Code generator main tasks:

 Instruction selection: which instructions to use

Factors to determining

 Level of IR,

 Nature of ISA (instruction set architecture) and

 Desired quality of generated code

 Register allocation and assignment

 Instruction ordering

7
ISSUES IN THE DESIGN OF A CODE
GENERATOR
 The following issues arise during the code generation
phase:
 Input to code generator
 Target program
 Memory management
 Instruction selection
 Register allocation
 Evaluation order
8
Input to code generator
 The input to the code generation consists of the IR of the source program produced by
front end, together with information in the symbol table to determine run-time addresses
of the data objects denoted by the names in the IR.

 Intermediate representation can be :


 Linear representation such as postfix notation
 Three address representation such as quadruples, triples, indirect triples.
 Virtual machine representation such as stack machine code and bytecodes.
 Graphical representations such as syntax trees and DAGs

 Prior to code generation, the front end must be scanned, parsed and translated
into IR along with necessary type checking. Therefore, input to code
generation is assumed to be error-free.

9
Target program
 The output of the code generator is the target program. The
output may be :
 Absolute machine language
 Producing an absolute machine language program as output has the advantage that
it can be placed in a fixed location in memory and immediately executed.

 Relocatable machine language


 Producing a relocatable machine language program as output allows subprograms
to be compiled separately.

10
Target program
 A set of relocatable object modules can be linked together and
loaded for execution by a linking loader.
 If the target machine does not handle relocation automatically,
the compiler must provide explicit relocation information
to the loader, to link the separately compiled program
segments.

 Assembly language
 Producing an assembly language program as output makes
the process of code generation some what easier
11
Memory Management
 Names in the source program are mapped to addresses of data
objects in run-time memory by the front end and code generator.

 It makes use of symbol table, that is, a name in a three-address


statement refers to a symbol table entry for the name.

 Labels in three-address statements have to be converted to


addresses of instructions.

 Local variables are stack allocation in the activation record


while global variables are in static area.
12
Instruction selection
 The instructions of target machine should be complete and uniform.

 Instruction speeds and machine idioms are important factors when


efficiency of target program is considered.

 The quality of the generated code is determined by its speed and size.

 The factors to be considered during instruction selection are:


 The uniformity and completeness of the instruction set.
 Instruction speed.
 Size of the instruction set.

13
Instruction selection
 The former statement can be translated into the latter statement as shown below:

Eg., for the following address code is:


a := b + c
d := a + e

inefficient assembly code is:

MOV b, R0 R0 ← b

ADD c, R0 R0 ← c + R0

MOV R0, a a ← R0

MOV a, R0 R0 ← a

ADD e, R0 R0 ← e + R0

MOV R0 , d d ← R0

Here the fourth statement is redundant, and so is the third statement if

'a' is not subsequently used.


14
Register allocation
 Instructions involving register operands are usually shorter
and faster than those involving operands in memory.
Therefore efficient utilization of registers is particularly
important in generating good code.

 The use of registers is subdivided into two sub problems :


 Register allocation - the set of variables that will reside in
registers at a point in the program is selected.
 Register assignment - the specific register that a value
picked. 15
Evaluation order
 The code generator decides the order in which the instruction will be
executed.

 It affects the efficiency of the target code.

 Some computation orders require fewer registers to hold


intermediate results than others.

 Picking a best order in the general case is a difficult NP-complete


problem.

 Initially, we shall avoid the problem by generating code for the three-
address statements in the order in which they have been produced by
the intermediate code generator.
16
Approaches to code generation issues
 Code generator must always generate the correct code.
It is essential because of the number of special cases
that a code generator might face.

 Some of the design goals of code generator are:


 Correct.
 Easily maintainable.
 Testable.
 Efficient.
17
Basic Blocks and Control Flow Graphs
 Basic Block is a set of statements that always executes one after other,
in a sequence.

 A basic block is the longest sequence of three-address codes with the


following properties.
 The control flows to the block only through the first three-address code.
 The flow goes out of the block only through the last three-address code.

 A control-flow graph is a directed graph G = (V,E), where the nodes


are the basic blocks and the edges correspond to the flow of control
from one basic block to another. As an example the edge eij = (vi , vj)
corresponds to the transfer of flow from the basic block vi to the
basic block vj.
18
Directed Acyclic Graph
 It is a tool that depicts the structure of basic blocks,
helps to see the flow of values flowing among the basic
blocks, and offers optimization too. DAG provides easy
transformation on basic blocks. DAG can be understood
here:
 Leaf nodes represent identifiers, names or constants.
 Interior nodes represent operators.
 Interior nodes also represent the results of expressions or the
identifiers/name where the values are to be stored or
assigned.
19
Directed Acyclic Graph
 t0 = a + b

 t1 = t0 + c

 d = t0 + t1

 .

20
Descriptors
 The code generator has to track both the registers (for availability) and
addresses (location of values) while generating the code. For both of
them, the following two descriptors are used:

 Register descriptor :
 It is used to inform the code generator about the availability of registers.
 It keeps track of values stored in each register.
 Whenever a new register is required during code generation, this
descriptor is consulted for register availability.
 The register descriptors show that all the registers are initially empty.

21
Descriptors
 Address descriptor :
 An address descriptor is used to store the location where current
value of the name can be found at run time.
 Values of the names (identifiers) used in the program might be
stored at different locations while in execution.
 It used to keep track of memory locations where the values of
identifiers are stored.
 These locations may include CPU registers, heaps, stacks, memory
or a combination of the mentioned locations.
22
getReg Function
 getReg : Code generator uses getReg function to determine
the status of available registers and the location of name
values. getReg works as follows:
 If variable Y is already in register R, it uses that register.
 Else if some register R is available, it uses that register.
 Else if both the above options are not possible, it chooses a
register that requires minimal number of load and store
instructions.

23
A code-generation algorithm
 The algorithm takes a sequence of three-address statements as input. For each three
address statement of the form x : = y op z perform the various actions. These are as
follows:
 Invoke a function getreg to find out the location L where the result of computation y op z
should be stored.
 Consult the address description for y to determine y'. If the value of y currently in memory
and register both then prefer the register y' . If the value of y is not already in L then
generate the instruction MOV y' , L to place a copy of y in L.
 Generate the instruction OP z' , L where z' is used to show the current location of z. if z is
in both then prefer a register to a memory location. Update the address descriptor of x to
indicate that x is in location L. If x is in L then update its descriptor and remove x from all
other descriptor.
 If the current value of y or z have no next uses or not live on exit from the block or in
register then alter the register descriptor to indicate that after execution of x : = y op z
those register will no longer contain y or z. 24

You might also like