Professional Documents
Culture Documents
Run-Time Environments: Stack Allocation of Space, Access to Nonlocal Data on the Stack,
Heap Management, Introduction to Garbage Collection, Introduction to Trace-Based
Collection.
Code Generation: Issues in the Design of a Code Generator, The Target Language, Addresses
in the Target Code, Basic Blocks and Flow Graphs, Optimization of Basic Blocks, A Simple
Code Generator, Peephole Optimization, Register Allocation and Assignment, Dynamic
Programming Code-Generation.
Almost all compilers for languages that use procedures, functions, or methods as units of
user-defined actions manage at least part of their run-time memory as a stack. Each time a
procedure1 is called, space for its local variables is pushed onto a stack, and when the
procedure terminates, that space is popped off the stack. As we shall see, this arrangement not
only allows space to be shared by procedure calls whose durations do not overlap in time, but
it allows us to compile code for a procedure in such a way that the relative addresses of its
nonlocal variables are always the same, regardless of the sequence of procedure calls.
Stack allocation is a procedure in which stack is used to organize the storage. The stack used
in stack allocation is known as control stack. In this type of allocation, creation of data
objects is performed dynamically. In stack allocation, activation records are created for the
allocation of memory. These activation records are pushed onto the stack using Last In First
Out (LIFO) method. Locals are stored in the activation records at run time and memory
addressing is done by using pointers and registers.
Page 1 of 22
The scope of a declaration in a block-structured language is given by the most closely nested
rule: – The scope of a declaration in a block B includes B. – If a name X is not declared in a
block B, then an occurrence of X in B is in the scope of a declaration of X in an enclosing
block B´ such that
● B´ has a declaration of X, and
● B´ is more closely nested around B than any other block with a declaration of X
Allocation. When a program requests memory for a variable or object,3 the memory
manager produces a chunk of contiguous heap memory of the requested size. If
possible, it satisfies an allocation request using free space in the heap; if no chunk of
the needed size is available, it seeks to increase the heap storage space by getting
consecutive bytes of virtual memory from the operating system. If space is exhausted,
the memory manager passes that information back to the application program.
De-allocation. The memory manager returns de-allocated space to the pool of free
space, so it can reuse the space to satisfy other allocation requests.
Memory managers typically do not return memory to the operating system, even if the
program's heap usage drops. Memory management would be simpler if (a) all allocation
requests were for chunks of the same size, and (b) storage were released predictably, say,
first-allocated first-de-allocated. There are some languages, such as Lisp, for which condition
(a) holds; pure Lisp uses only one data element — a two pointer cell — from which all data
structures are built. Condition (b) also holds in some situations, the most common being data
that can be allocated on the run-time stack. However, in most languages, neither (a) nor (b)
Page 2 of 22
holds in general. Rather, data elements of different sizes are allocated, and there is no good
way to predict the lifetimes of all allocated objects.
Space Efficiency. A memory manager should minimize the total heap space needed
by a program. Doing so allows larger programs to run in a fixed virtual address space.
Space efficiency is achieved by minimizing "fragmentation," discussed in Section
7.4.4.
Program Efficiency. A memory manager should make good use of the memory
subsystem to allow programs to run faster. As the time taken to execute an instruction
can vary widely depending on where objects are placed in memory. Fortunately,
programs tend to exhibit "locality," which refers to the nonrandom clustered way in
which typical programs access memory. By attention to the placement of objects in
memory, the memory manager can make better use of space and, hopefully, make the
program run faster.
Page 4 of 22
METHOD: The algorithm, shown in Fig. 7.21, uses several simple data structures. List Free
holds objects known to be free. A list called Unscanned, holds objects that we have
determined are reached, but whose successors we have not yet considered. That is, we have
not scanned these objects to see what other objects can be reached through them. The
Unscanned list is empty initially.
Additionally, each object includes a bit to indicate whether it has been reached (the reached-
bit). Before the algorithm begins, all allocated objects have the reached-bit set to 0.
In line (1) of Fig. 7.21, we initialize the Unscanned list by placing there all the objects
referenced by the root set. The reached-bit for these objects is also set to 1. Lines (2) through
(7) are a loop, in which we, in turn, examine each object o that is ever placed on the
Unscanned list. The for-loop of lines (4) through (7) implements the scanning of object o.
We examine each object d for which we find a reference within o. If d has already been
reached (its reached-bit is 1), then there is no need to do anything about d; it either has been
scanned previously, or it is on the Unscanned list to be scanned later. However, if d was not
reached already, then we need to set its reached-bit to 1 in line (6) and add d to the
Unscanned list in line (7). Figure 7.22 illustrates this process. It shows an Unscanned list with
four objects. The first object on this list, corresponding to object o in the discussion above, is
in the process of being scanned. The dashed lines correspond to the three kinds of objects that
might be reached from o:
Page 5 of 22
1. A previously scanned object that need not be scanned again.
2. An object currently on the Unscanned list.
3. An item that is reachable, but was previously thought to be unreached.
Lines (8) through (11), the sweeping phase, reclaim the space of all the objects that remain
unreached at the end of the marking phase. Note that these will include any objects that were
on the Free list originally. Because the set of unreached objects cannot be enumerated
directly, the algorithm sweeps through the entire heap. Line (10) puts free and unreached
objects on the Free list, one at a time. Line (11) handles the reachable objects. We set their
reached-bit to 0, in order to maintain the proper preconditions for the next execution of the
garbage-collection algorithm.
Assembly language as output makes the code generation easier. We can generate symbolic
instructions and use macro-facilities of assembler in generating code. And we need an
additional assembly step after code generation.
3. Memory Management
Mapping the names in the source program to the addresses of data objects is done by the front
end and the code generator. A name in the three address statements refers to the symbol table
entry for name. Then from the symbol table entry, a relative address can be determined for
the name.
4. Instruction selection
Selecting the best instructions will improve the efficiency of the program. It includes the
instructions that should be complete and uniform. Instruction speeds and machine idioms also
plays a major role when efficiency is considered. But if we do not care about the efficiency of
the target program then instruction selection is straight-forward.
For example, the respective three-address statements would be translated into the latter code
sequence as shown below:
P:=Q+R
S:=P+T
MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S
Here the fourth statement is redundant as the value of the P is loaded again in that statement
that just has been stored in the previous statement. It leads to an inefficient code sequence. A
given intermediate representation can be translated into many code sequences, with
significant cost differences between the different implementations. A prior knowledge of
Page 7 of 22
instruction cost is needed in order to design good sequences, but accurate cost information is
difficult to predict.
5. Register allocation issues
Use of registers make the computations faster in comparison to that of memory, so efficient
utilization of registers is important. The use of registers are subdivided into two subproblems:
1. During Register allocation – we select only those set of variables that will reside in the
registers at each point in the program.
2. During a subsequent Register assignment phase, the specific register is picked to access the
variable.
As the number of variables increases, the optimal assignment of registers to variables
becomes difficult. Mathematically, this problem becomes NP-complete. Certain machine
requires register pairs consist of an even and next odd-numbered register. For example
M a, b
These types of multiplicative instruction involve register pairs where the multiplicand is an
even register and b, the multiplier is the odd register of the even/odd register pair.
Evaluation order –
The code generator decides the order in which the instruction will be executed. The order of
computations affects the efficiency of the target code. Among many computational orders,
some will require only fewer registers to hold the intermediate results. However, picking the
best order in the general case is a difficult NP-complete problem.
Approaches to code generation issues: Code generator must always generate the correct code.
It is essential because of the number of special cases that a code generator might face. Some
of the design goals of code generator are:
Correct
Easily maintainable
Testable
Efficient
Page 8 of 22
All the statements execute in the same order they appear.
Here,
All the statements execute in a sequence one after the other.
Flow Graphs-
A flow graph is a directed graph with flow control information added to the basic blocks.
The basic blocks serve as nodes of the flow graph.
Example:
Compute the basic blocks for the given three address statements-
(1) PROD = 0
(2) I = 1
(3) T2 = addr(A) – 4
(4) T4 = addr(B) – 4
(5) T1 = 4 x I
(6) T3 = T2[T1]
Page 9 of 22
(7) T5 = T4[T1]
(8) T6 = T3 x T5
(9) PROD = PROD + T6
(10) I = I + 1
(11) IF I <=20 GOTO (5)
Solution-
We have-
Now, the given code can be partitioned into two basic blocks as-
Page 10 of 22
Page 11 of 22
The required flow graph is-
There are two type of basic block optimization. These are as follows:
Page 12 of 22
1. Structure-Preserving Transformations
2. Algebraic Transformations
1. Structure preserving transformations:
The primary Structure-Preserving Transformation on basic blocks is as follows:
2. b : = a - d
3. c : = b + c
4. d : = a - d
In the above expression, the second and forth expression computed the same expression. So the
block can be transformed as follows:
1. a : = b + c
2. b : = a - d
3. c : = b + c
4. d : = b
o This can be caused when once declared and defined once and forget to remove them in this
case they serve no purpose.
o Suppose the statement x:= y + z appears in a block and x is dead symbol that means it will
never subsequently used. Then without changing the value of the basic block you can safely
remove this statement.
(c) Renaming temporary variables
Page 13 of 22
A statement t:= b + c can be changed to u:= b + c where t is a temporary variable and u is a
new temporary variable. All the instance of t can be replaced with the u without changing the
basic block value.
(d) Interchange of statement
Suppose a block has the following two adjacent statements:
1. t1 : = b + c
2. t2 : = x + y
These two statements can be interchanged without affecting the value of block when value of
t1 does not affect the value of t2.
2. Algebraic transformations:
o In the algebraic transformation, we can change the set of expression into an
algebraically equivalent set. Thus the expression x:= x + 0 or x:= x *1 can be
eliminated from a basic block without changing the set of expression.
1. a:= b + c
2. e:= c +d +b
The following intermediate code may be generated:
1. a:= b + c
2. t:= c +d
3. e:= t + b
A code generator generates target code for a sequence of three- address statements and
effectively uses registers to store operands of the statements.
For example: consider the three-address statement a := b+c It can have the following
sequence of codes:
Page 14 of 22
ADD Rj, Ri Cost = 1
(or)
ADD c, Ri Cost = 2
(or)
MOV c, Rj Cost = 3
ADD Rj, Ri
Register and Address Descriptors:
• A register descriptor is used to keep track of what is currently in each registers. The register
descriptors show that initially all the registers are empty.
• An address descriptor stores the location where the current value of the name can be found
at run time.
A code-generation algorithm:
The algorithm takes as input a sequence of three-address statements constituting a basic
block. For each three-address statement of the form x : = y op z, perform the following
actions:
1. Invoke a function getreg to determine the location L where the result of the computation y
op z should be stored.
2. Consult the address descriptor for y to determine y’, the current location of y. Prefer the
register for y’ if the value of y is currently both in memory and a register. If the value of y is
not already in L, generate the instruction MOV y’ , L to place a copy of y in L.
4. If the current values of y or z have no next uses, are not live on exit from the block, and are
in registers, alter the register descriptor to indicate that, after execution of x : = y op z , those
registers will no longer contain y or z
Generating Code for Assignment Statements:
• The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-
address code sequence: Code sequence for the example is:
Page 15 of 22
Generating Code for Indexed Assignments
The table shows the code sequences generated for the indexed assignmen a:= b[ i ] and
a[ i ]:=b
Page 16 of 22
4.10 PEEPHOLE OPTIMIZATION
Peephole optimization is a type of Code Optimization performed on a small part of the code.
It is performed on the very small set of instructions in a segment of code.
The small set of instructions or small part of code on which peephole optimization is
performed is known as peephole or window.
It basically works on the theory of replacement in which a part of code is replaced by shorter
and faster code without change in output.
Peephole is the machine dependent optimization.
Optimized code:
y = x + 5;
i = y;
w = y * 3;
2. Constant folding:
The code that can be simplified by user itself, is simplified.
Initial code:
x = 2 * 3;
Optimized code:
x = 6;
3. Strength Reduction:
The operators that consume higher execution time are replaced by the operators consuming
less execution time.
Page 17 of 22
Initial code:
y = x * 2;
4. Optimized code:
y = x + x; or y = x << 1;
Initial code:
y = x / 2;
Optimized code:
y = x >> 1;
5. Null sequences:
Useless operations are deleted.
6. Combine operations:
Several operations are replaced by a single equivalent operation.
Advantage
Heavily used values reside in registers
Disadvantage
Does not consider non-uniform distribution of uses
Need of global register allocation
Local allocation does not take into account that some instructions (e.g. those in loops)
execute more frequently. It forces us to store/load at basic block endpoints since each block
has no knowledge of the context of others.
To find out the live range(s) of each variable and the area(s) where the variable is
used/defined global allocation is needed. Cost of spilling will depend on frequencies and
locations of uses.
Register allocation depends on:
Size of live range
Number of uses/definitions
Frequency of execution
Page 18 of 22
Number of loads/stores needed.
Cost of loads/stores needed.
Register allocation by graph coloring
Global register allocation can be seen as a graph coloring problem.
Basic idea:
1. Identify the live range of each variable
2. Build an interference graph that represents conflicts between live ranges (two nodes are
connected if the variables they represent are live at the same moment)
3. Try to assign as many colors to the nodes of the graph as there are registers so that two
neighbors have different colors
Page 19 of 22
4.12 DYNAMIC PROGRAMMING CODE-GENERATION
2. Traverse T, using the cost vectors to determine which subtrees of T must be computed
into memory.
3. Traverse each tree using the cost vectors and associated instructions to generate the
final target code. The code for the subtrees computed into memory locations is generated
first.
Each of these phases can be implemented to run in time linearly proportional to the size of
the expression tree.
The cost of computing a node n includes whatever loads and stores are necessary to evaluate
S in the given number of registers. It also includes the cost of computing the operator at the
Page 20 of 22
root of S. The zeroth component of the cost vector is the optimal cost of computing the
subtree S into memory. The contiguous evaluation property ensures that an optimal program
for S can be generated by considering combinations of optimal programs only for the subtrees
of the root of S. This restriction reduces the number of cases that need to be considered.
In order to compute the costs C[i] at node n, we view the instructions as tree-rewriting rules,
as in Section 8.9. Consider each template E that matches the input tree at node n. By
examining the cost vectors at the corresponding descendants of n, determine the costs of
evaluating the operands at the leaves of E. For those operands of E that are registers, consider
all possible orders in which the corresponding subtrees of T can be evaluated into registers. In
each ordering, the first subtree corresponding to a register operand can be evaluated using i
available registers, the second using i -1 registers, and so on. To account for node n, add in
the cost of the instruction associated with the template E. The value C[i] is then the minimum
cost over all possible orders.
The cost vectors for the entire tree T can be computed bottom up in time linearly proportional
to the number of nodes in T. It is convenient to store at each node the instruction used to
achieve the best cost for C[i] for each value of i. The smallest cost in the vector for the root of
T gives the minimum cost of evaluating T.
Example: Consider a machine having two registers RO and Rl, and the following
instructions, each of unit cost:
Page 21 of 22
register with two registers available, is the same as that with one register available. The cost
vector at leaf a is therefore (0,1,1).
Consider the cost vector at the root. We first determine the minimum cost of computing the
root with one and two registers available. The machine instruction ADD RO, RO, M matches
the root, because the root is labeled with the operator . Using this instruction, the minimum
cost of evaluating the root with one register available is the minimum cost of computing its
right subtree into memory, plus the minimum cost of computing its left subtree into the
register, plus 1 for the instruction. No other way exists. The cost vectors at the right and left
children of the root show that the minimum cost of computing the root with one register
available is 5 2 1 = 8.
Now consider the minimum cost of evaluating the root with two registers available. Three
cases arise depending on which instruction is used to compute the root and in what order the
left and right subtrees of the root are evaluated.
Dynamic programming techniques have been used in a number of compilers, including the
second version of the portable C compiler, PCC2. The technique facilitates retargeting
because of the applicability of the dynamic programming technique to a broad class of
machines.
Page 22 of 22