You are on page 1of 22

UNIT – IV

Run-Time Environments: Stack Allocation of Space, Access to Nonlocal Data on the Stack,
Heap Management, Introduction to Garbage Collection, Introduction to Trace-Based
Collection.
Code Generation: Issues in the Design of a Code Generator, The Target Language, Addresses
in the Target Code, Basic Blocks and Flow Graphs, Optimization of Basic Blocks, A Simple
Code Generator, Peephole Optimization, Register Allocation and Assignment, Dynamic
Programming Code-Generation.

4.1 STACK ALLOCATION OF SPACE

Almost all compilers for languages that use procedures, functions, or methods as units of
user-defined actions manage at least part of their run-time memory as a stack. Each time a
procedure1 is called, space for its local variables is pushed onto a stack, and when the
procedure terminates, that space is popped off the stack. As we shall see, this arrangement not
only allows space to be shared by procedure calls whose durations do not overlap in time, but
it allows us to compile code for a procedure in such a way that the relative addresses of its
nonlocal variables are always the same, regardless of the sequence of procedure calls.
Stack allocation is a procedure in which stack is used to organize the storage. The stack used
in stack allocation is known as control stack. In this type of allocation, creation of data
objects is performed dynamically. In stack allocation, activation records are created for the
allocation of memory. These activation records are pushed onto the stack using Last In First
Out (LIFO) method. Locals are stored in the activation records at run time and memory
addressing is done by using pointers and registers.

 Stack allocation makes data structures and objects dynamically.


 While in stack allocation, allocation of data objects is performed at run time.
 It supports recursive procedures.
 Stack allocation use stack to manage the allocation of memory at run time.

4.2 ACCESS TO NONLOCAL DATA ON THE STACK

Page 1 of 22
The scope of a declaration in a block-structured language is given by the most closely nested
rule: – The scope of a declaration in a block B includes B. – If a name X is not declared in a
block B, then an occurrence of X in B is in the scope of a declaration of X in an enclosing
block B´ such that
● B´ has a declaration of X, and
● B´ is more closely nested around B than any other block with a declaration of X

4.3 HEAP MANAGEMENT


Introduction: Heap used for dynamically allocated memory, its important operations includes
allocation and de-allocation. In C , Pascal, and Java allocation is done via the “new” operator
and C allocation is done via the “malloc” function call. De-allocation is done either
automatically. The heap is the portion of the store that is used for data that lives indefinitely,
or until the program explicitly deletes it.
While local variables typically become inaccessible when their procedures end, many
languages enable us to create objects or other data whose existence is not tied to the
procedure activation that creates them. For example, both C and Java give the programmer
new to create objects that may be passed — or pointers to them may be passed — from
procedure to procedure, so they continue to exist long after the procedure that created them is
gone. Such objects are stored on a heap.
Heap Memory Manager: The memory manager keeps track of all the free space in heap
storage at all times. It performs two basic functions:

 Allocation. When a program requests memory for a variable or object,3 the memory
manager produces a chunk of contiguous heap memory of the requested size. If
possible, it satisfies an allocation request using free space in the heap; if no chunk of
the needed size is available, it seeks to increase the heap storage space by getting
consecutive bytes of virtual memory from the operating system. If space is exhausted,
the memory manager passes that information back to the application program.

 De-allocation. The memory manager returns de-allocated space to the pool of free
space, so it can reuse the space to satisfy other allocation requests.

Memory managers typically do not return memory to the operating system, even if the
program's heap usage drops. Memory management would be simpler if (a) all allocation
requests were for chunks of the same size, and (b) storage were released predictably, say,
first-allocated first-de-allocated. There are some languages, such as Lisp, for which condition
(a) holds; pure Lisp uses only one data element — a two pointer cell — from which all data
structures are built. Condition (b) also holds in some situations, the most common being data
that can be allocated on the run-time stack. However, in most languages, neither (a) nor (b)

Page 2 of 22
holds in general. Rather, data elements of different sizes are allocated, and there is no good
way to predict the lifetimes of all allocated objects.

Here are the properties we desire of memory managers:

 Space Efficiency. A memory manager should minimize the total heap space needed
by a program. Doing so allows larger programs to run in a fixed virtual address space.
Space efficiency is achieved by minimizing "fragmentation," discussed in Section
7.4.4.

 Program Efficiency. A memory manager should make good use of the memory
subsystem to allow programs to run faster. As the time taken to execute an instruction
can vary widely depending on where objects are placed in memory. Fortunately,
programs tend to exhibit "locality," which refers to the nonrandom clustered way in
which typical programs access memory. By attention to the placement of objects in
memory, the memory manager can make better use of space and, hopefully, make the
program run faster.

 Low Overhead. Because memory allocations and de-allocations are frequent


operations in many programs, it is important that these operations be as efficient as
possible. That is, we wish to minimize the overhead — the fraction of execution time
spent performing allocation and de-allocation. Notice that the cost of allocations is
dominated by small requests; the overhead of managing large objects is less
important, because it usually can be amortized over a larger amount of computation.

4.4 INTRODUCTION TO GARBAGE COLLECTION

In computer science, garbage collection is a type of memory management. It automatically


cleans up unused objects and pointers in memory, allowing the resources to be used again.
Some programming languages have built-in garbage collection, while others require custom
functions to manage unused memory.
A common method of garbage collection is called reference counting. This strategy simply
counts how many references there are to each object stored in memory. If an object has zero
references, it is considered unnecessary and can be deleted to free up the space in memory.
Advanced reference counting detects objects that only reference each other, which indicates
the objects are unused by the parent process.
Garbage collection may also be done at compile-time, when a program's source code is
compiled into an executable program. In this method, the compiler determines which
resources in memory will never be accessed after a certain time. It can then add instructions
Page 3 of 22
to automatically deallocate those resources from memory. While this is an effective way to
eliminate unused objects, it must be done conservatively to avoid deleting references required
by the program.
Garbage collection is an important part of software development since it keeps programs
from using up too much RAM. Besides helping programs run more efficiently, it can also
prevent serious bugs, such as memory leaks, that can cause a program to crash.

4.5 INTRODUCTION TO TRACE-BASED COLLECTION


Introduction: Instead of collecting garbage as it is created, trace-based collectors run
periodically to find unreachable objects and reclaim their space. Typically, we run the trace-
based collector whenever the free space is exhausted or its amount drops below some
threshold.

We begin this section by introducing the simplest "mark-and-sweep" garbage collection


algorithm. We then describe the variety of trace-based algorithms in terms of four states that
chunks of memory can be put in. This section also contains a number of improvements on the
basic algorithm, including those in which object relocation is a part of the garbage-collection
function.
A Basic Mark-and-Sweep Collector: Mark-and-sweep garbage-collection algorithms are
straightforward, stop-the world algorithms that find all the unreachable objects, and put them
on the list of free space. Following Algorithm visits and "marks" all the reachable objects in
the first tracing step and then "sweeps" the entire heap to free up unreachable objects.
Following Algorithm, which we consider after introducing a general framework for trace-
based algorithms, is an optimization of following Algorithm. By using an additional list to
hold all the allocated objects, it visits the reachable objects only once.
Algorithm: Mark-and-sweep garbage collection.
INPUT: A root set of objects, a heap, and a free list, called Free, with all the unallocated
chunks of the heap. All chunks of space are marked with boundary tags to indicate their
free/used status and size.
OUTPUT: A modified free list after all the garbage has been removed.

Page 4 of 22
METHOD: The algorithm, shown in Fig. 7.21, uses several simple data structures. List Free
holds objects known to be free. A list called Unscanned, holds objects that we have
determined are reached, but whose successors we have not yet considered. That is, we have
not scanned these objects to see what other objects can be reached through them. The
Unscanned list is empty initially.

Additionally, each object includes a bit to indicate whether it has been reached (the reached-
bit). Before the algorithm begins, all allocated objects have the reached-bit set to 0.
In line (1) of Fig. 7.21, we initialize the Unscanned list by placing there all the objects
referenced by the root set. The reached-bit for these objects is also set to 1. Lines (2) through
(7) are a loop, in which we, in turn, examine each object o that is ever placed on the
Unscanned list. The for-loop of lines (4) through (7) implements the scanning of object o.
We examine each object d for which we find a reference within o. If d has already been
reached (its reached-bit is 1), then there is no need to do anything about d; it either has been
scanned previously, or it is on the Unscanned list to be scanned later. However, if d was not
reached already, then we need to set its reached-bit to 1 in line (6) and add d to the
Unscanned list in line (7). Figure 7.22 illustrates this process. It shows an Unscanned list with
four objects. The first object on this list, corresponding to object o in the discussion above, is
in the process of being scanned. The dashed lines correspond to the three kinds of objects that
might be reached from o:

Page 5 of 22
1. A previously scanned object that need not be scanned again.
2. An object currently on the Unscanned list.
3. An item that is reachable, but was previously thought to be unreached.

Lines (8) through (11), the sweeping phase, reclaim the space of all the objects that remain
unreached at the end of the marking phase. Note that these will include any objects that were
on the Free list originally. Because the set of unreached objects cannot be enumerated
directly, the algorithm sweeps through the entire heap. Line (10) puts free and unreached
objects on the Free list, one at a time. Line (11) handles the reachable objects. We set their
reached-bit to 0, in order to maintain the proper preconditions for the next execution of the
garbage-collection algorithm.

4.6 ISSUES IN THE DESIGN OF A CODE GENERATOR


Code generator converts the intermediate representation of source code into a form that can
be readily executed by the machine. A code generator is expected to generate the correct
code. Designing of code generator should be done in such a way so that it can be easily
implemented, tested and maintained.

The following issue arises during the code generation phase:


1. Input to code generator
The input to code generator is the intermediate code generated by the front end, along with
information in the symbol table that determines the run-time addresses of the data-objects
denoted by the names in the intermediate representation. Intermediate codes may be
represented mostly in quadruples, triples, indirect triples, Postfix notation, syntax trees,
DAG’s, etc. The code generation phase just proceeds on an assumption that the input are free
from all of syntactic and state semantic errors, the necessary type checking has taken place
and the type-conversion operators have been inserted wherever necessary.
2. Target program
Page 6 of 22
The target program is the output of the code generator. The output may be absolute machine
language, relocatable machine language, assembly language.
 Absolute machine language as output has advantages that it can be placed in a fixed
memory location and can be immediately executed.

 Relocatable machine language as an output allows subprograms and subroutines to be


compiled separately. Relocatable object modules can be linked together and loaded by
linking loader. But there is added expense of linking and loading.

 Assembly language as output makes the code generation easier. We can generate symbolic
instructions and use macro-facilities of assembler in generating code. And we need an
additional assembly step after code generation.
3. Memory Management
Mapping the names in the source program to the addresses of data objects is done by the front
end and the code generator. A name in the three address statements refers to the symbol table
entry for name. Then from the symbol table entry, a relative address can be determined for
the name.
4. Instruction selection
Selecting the best instructions will improve the efficiency of the program. It includes the
instructions that should be complete and uniform. Instruction speeds and machine idioms also
plays a major role when efficiency is considered. But if we do not care about the efficiency of
the target program then instruction selection is straight-forward.
For example, the respective three-address statements would be translated into the latter code
sequence as shown below:

P:=Q+R
S:=P+T
MOV Q, R0
ADD R, R0
MOV R0, P
MOV P, R0
ADD T, R0
MOV R0, S
Here the fourth statement is redundant as the value of the P is loaded again in that statement
that just has been stored in the previous statement. It leads to an inefficient code sequence. A
given intermediate representation can be translated into many code sequences, with
significant cost differences between the different implementations. A prior knowledge of

Page 7 of 22
instruction cost is needed in order to design good sequences, but accurate cost information is
difficult to predict.
5. Register allocation issues

Use of registers make the computations faster in comparison to that of memory, so efficient
utilization of registers is important. The use of registers are subdivided into two subproblems:
1. During Register allocation – we select only those set of variables that will reside in the
registers at each point in the program.

2. During a subsequent Register assignment phase, the specific register is picked to access the
variable.
As the number of variables increases, the optimal assignment of registers to variables
becomes difficult. Mathematically, this problem becomes NP-complete. Certain machine
requires register pairs consist of an even and next odd-numbered register. For example
M a, b
These types of multiplicative instruction involve register pairs where the multiplicand is an
even register and b, the multiplier is the odd register of the even/odd register pair.
Evaluation order –

The code generator decides the order in which the instruction will be executed. The order of
computations affects the efficiency of the target code. Among many computational orders,
some will require only fewer registers to hold the intermediate results. However, picking the
best order in the general case is a difficult NP-complete problem.
Approaches to code generation issues: Code generator must always generate the correct code.
It is essential because of the number of special cases that a code generator might face. Some
of the design goals of code generator are:
 Correct

 Easily maintainable

 Testable

 Efficient

4.7 BASIC BLOCKS AND FLOW GRAPHS


Basic block is a set of statements that always executes in a sequence one after the other.
The characteristics of basic blocks are-
 They do not contain any kind of jump statements in them.

 There is no possibility of branching or getting halt in the middle.

Page 8 of 22
 All the statements execute in the same order they appear.

 They do not lose lose the flow control of the program.


Example Of Basic Block
Three Address Code for the expression a = b + c + d is-

Here,
 All the statements execute in a sequence one after the other.

 Thus, they form a basic block.

Flow Graphs-
A flow graph is a directed graph with flow control information added to the basic blocks.
 The basic blocks serve as nodes of the flow graph.

 There is a directed edge from block B1 to block B2 if B2 appears immediately after B1 in


the code.

Example:
Compute the basic blocks for the given three address statements-
(1) PROD = 0
(2) I = 1
(3) T2 = addr(A) – 4
(4) T4 = addr(B) – 4
(5) T1 = 4 x I

(6) T3 = T2[T1]

Page 9 of 22
(7) T5 = T4[T1]
(8) T6 = T3 x T5
(9) PROD = PROD + T6
(10) I = I + 1
(11) IF I <=20 GOTO (5)
Solution-

We have-

 PROD = 0 is a leader since first statement of the code is a leader.

 T1 = 4 x I is a leader since target of the conditional goto statement is a leader.

Now, the given code can be partitioned into two basic blocks as-

Page 10 of 22
Page 11 of 22
The required flow graph is-

4.8 OPTIMIZATION OF BASIC BLOCKS


Optimization process can be applied on a basic block. While optimization, we don't need to
change the set of expressions computed by the block.

There are two type of basic block optimization. These are as follows:

Page 12 of 22
1. Structure-Preserving Transformations
2. Algebraic Transformations
1. Structure preserving transformations:
The primary Structure-Preserving Transformation on basic blocks is as follows:

o Common sub-expression elimination

o Dead code elimination

o Renaming of temporary variables

o Interchange of two independent adjacent statements


(a) Common sub-expression elimination:
In the common sub-expression, you don't need to be computed it over and over again. Instead of
this you can compute it once and kept in store from where it's referenced when encountered
again.
1. a : = b + c

2. b : = a - d

3. c : = b + c

4. d : = a - d

In the above expression, the second and forth expression computed the same expression. So the
block can be transformed as follows:
1. a : = b + c

2. b : = a - d

3. c : = b + c

4. d : = b

(b) Dead-code elimination


o It is possible that a program contains a large amount of dead code.

o This can be caused when once declared and defined once and forget to remove them in this
case they serve no purpose.

o Suppose the statement x:= y + z appears in a block and x is dead symbol that means it will
never subsequently used. Then without changing the value of the basic block you can safely
remove this statement.
(c) Renaming temporary variables

Page 13 of 22
A statement t:= b + c can be changed to u:= b + c where t is a temporary variable and u is a
new temporary variable. All the instance of t can be replaced with the u without changing the
basic block value.
(d) Interchange of statement
Suppose a block has the following two adjacent statements:
1. t1 : = b + c

2. t2 : = x + y

These two statements can be interchanged without affecting the value of block when value of
t1 does not affect the value of t2.
2. Algebraic transformations:
o In the algebraic transformation, we can change the set of expression into an
algebraically equivalent set. Thus the expression x:= x + 0 or x:= x *1 can be
eliminated from a basic block without changing the set of expression.

o Constant folding is a class of related optimization. Here at compile time, we


evaluate constant expressions and replace the constant expression by their values.
Thus the expression 5*2.7 would be replaced by13.5.

o Sometimes the unexpected common sub expression is generated by the relational


operators like <=, >=, <, >, +, = etc.

o Sometimes associative expression is applied to expose common sub expression


without changing the basic block value. if the source code has the assignments

1. a:= b + c

2. e:= c +d +b
The following intermediate code may be generated:
1. a:= b + c

2. t:= c +d

3. e:= t + b

4.9 A SIMPLE CODE GENERATOR

 A code generator generates target code for a sequence of three- address statements and
effectively uses registers to store operands of the statements.
 For example: consider the three-address statement a := b+c It can have the following
sequence of codes:

Page 14 of 22
ADD Rj, Ri Cost = 1
(or)
ADD c, Ri Cost = 2
(or)
MOV c, Rj Cost = 3
ADD Rj, Ri
Register and Address Descriptors:
• A register descriptor is used to keep track of what is currently in each registers. The register
descriptors show that initially all the registers are empty.
• An address descriptor stores the location where the current value of the name can be found
at run time.
A code-generation algorithm:
The algorithm takes as input a sequence of three-address statements constituting a basic
block. For each three-address statement of the form x : = y op z, perform the following
actions:
1. Invoke a function getreg to determine the location L where the result of the computation y
op z should be stored.

2. Consult the address descriptor for y to determine y’, the current location of y. Prefer the
register for y’ if the value of y is currently both in memory and a register. If the value of y is
not already in L, generate the instruction MOV y’ , L to place a copy of y in L.

3. Generate the instruction OP z’ , L where z’ is a current location of z. Prefer a register to a


memory location if z is in both. Update the address descriptor of x to indicate that x is in
location L. If x is in L, update its descriptor and remove x from all other descriptors.

4. If the current values of y or z have no next uses, are not live on exit from the block, and are
in registers, alter the register descriptor to indicate that, after execution of x : = y op z , those
registers will no longer contain y or z
Generating Code for Assignment Statements:
• The assignment d : = (a-b) + (a-c) + (a-c) might be translated into the following three-
address code sequence: Code sequence for the example is:

Page 15 of 22
Generating Code for Indexed Assignments
The table shows the code sequences generated for the indexed assignmen a:= b[ i ] and
a[ i ]:=b

Generating Code for Pointer Assignments


The table shows the code sequences generated for the pointer assignments a : = *p and *p :=
a

if x < 0 goto z ADD z, R0


MOV R0,x
CJ< z

Page 16 of 22
4.10 PEEPHOLE OPTIMIZATION
Peephole optimization is a type of Code Optimization performed on a small part of the code.
It is performed on the very small set of instructions in a segment of code.
The small set of instructions or small part of code on which peephole optimization is
performed is known as peephole or window.
It basically works on the theory of replacement in which a part of code is replaced by shorter
and faster code without change in output.
Peephole is the machine dependent optimization.

Objectives of Peephole Optimization:


The objective of peephole optimization is:
1. To improve performance
2. To reduce memory footprint
3. To reduce code size
Peephole Optimization Techniques:
1. Redundant load and store elimination:
In this technique the redundancy is eliminated.
Initial code:
y = x + 5;
i = y;
z = i;
w = z * 3;

Optimized code:
y = x + 5;
i = y;
w = y * 3;
2. Constant folding:
The code that can be simplified by user itself, is simplified.
Initial code:
x = 2 * 3;
Optimized code:
x = 6;
3. Strength Reduction:
The operators that consume higher execution time are replaced by the operators consuming
less execution time.

Page 17 of 22
Initial code:
y = x * 2;
4. Optimized code:
y = x + x; or y = x << 1;
Initial code:
y = x / 2;
Optimized code:
y = x >> 1;
5. Null sequences:
Useless operations are deleted.
6. Combine operations:
Several operations are replaced by a single equivalent operation.

4.11 REGISTER ALLOCATION AND ASSIGNMENT:


Local register allocation
Register allocation is only within a basic block. It follows top-down approach.
Assign registers to the most heavily used variables
 Traverse the block
 Count uses
 Use count as a priority function
 Assign registers to higher priority variables first

Advantage
Heavily used values reside in registers
Disadvantage
Does not consider non-uniform distribution of uses
Need of global register allocation
Local allocation does not take into account that some instructions (e.g. those in loops)
execute more frequently. It forces us to store/load at basic block endpoints since each block
has no knowledge of the context of others.
To find out the live range(s) of each variable and the area(s) where the variable is
used/defined global allocation is needed. Cost of spilling will depend on frequencies and
locations of uses.
Register allocation depends on:
Size of live range
Number of uses/definitions
Frequency of execution

Page 18 of 22
Number of loads/stores needed.
Cost of loads/stores needed.
Register allocation by graph coloring
Global register allocation can be seen as a graph coloring problem.
Basic idea:
1. Identify the live range of each variable

2. Build an interference graph that represents conflicts between live ranges (two nodes are
connected if the variables they represent are live at the same moment)

3. Try to assign as many colors to the nodes of the graph as there are registers so that two
neighbors have different colors

Page 19 of 22
4.12 DYNAMIC PROGRAMMING CODE-GENERATION

Introduction: The dynamic programming algorithm proceeds in three phases :


1. Compute bottom-up for each node n of the expression tree T an array C of costs, in
which the zth component C[i] is the optimal cost of computing the subtree S rooted at n
into a register, assuming i registers are available for the computation, for 1 < i < r.

2. Traverse T, using the cost vectors to determine which subtrees of T must be computed
into memory.

3. Traverse each tree using the cost vectors and associated instructions to generate the
final target code. The code for the subtrees computed into memory locations is generated
first.

Each of these phases can be implemented to run in time linearly proportional to the size of
the expression tree.

The cost of computing a node n includes whatever loads and stores are necessary to evaluate
S in the given number of registers. It also includes the cost of computing the operator at the

Page 20 of 22
root of S. The zeroth component of the cost vector is the optimal cost of computing the
subtree S into memory. The contiguous evaluation property ensures that an optimal program
for S can be generated by considering combinations of optimal programs only for the subtrees
of the root of S. This restriction reduces the number of cases that need to be considered.

In order to compute the costs C[i] at node n, we view the instructions as tree-rewriting rules,
as in Section 8.9. Consider each template E that matches the input tree at node n. By
examining the cost vectors at the corresponding descendants of n, determine the costs of
evaluating the operands at the leaves of E. For those operands of E that are registers, consider
all possible orders in which the corresponding subtrees of T can be evaluated into registers. In
each ordering, the first subtree corresponding to a register operand can be evaluated using i
available registers, the second using i -1 registers, and so on. To account for node n, add in
the cost of the instruction associated with the template E. The value C[i] is then the minimum
cost over all possible orders.

The cost vectors for the entire tree T can be computed bottom up in time linearly proportional
to the number of nodes in T. It is convenient to store at each node the instruction used to
achieve the best cost for C[i] for each value of i. The smallest cost in the vector for the root of
T gives the minimum cost of evaluating T.

Example: Consider a machine having two registers RO and Rl, and the following
instructions, each of unit cost:

In these instructions, Ri is either RO or Rl, and Mi is a memory location. The operator op


corresponds to arithmetic operators.
Let us apply the dynamic programming algorithm to generate optimal code for the syntax tree
in Fig 8.26. In the first phase, we compute the cost vectors shown at each node. To illustrate
this cost computation, consider the cost vector at the leaf a. C[0], the cost of computing a into
memory, is 0 since it is already there. C[l], the cost of computing a into a register, is 1 since
we can load it into a register with the instruction LD RO, a. C[2], the cost of loading a into a

Page 21 of 22
register with two registers available, is the same as that with one register available. The cost
vector at leaf a is therefore (0,1,1).

Consider the cost vector at the root. We first determine the minimum cost of computing the
root with one and two registers available. The machine instruction ADD RO, RO, M matches
the root, because the root is labeled with the operator . Using this instruction, the minimum
cost of evaluating the root with one register available is the minimum cost of computing its
right subtree into memory, plus the minimum cost of computing its left subtree into the
register, plus 1 for the instruction. No other way exists. The cost vectors at the right and left
children of the root show that the minimum cost of computing the root with one register
available is 5 2 1 = 8.
Now consider the minimum cost of evaluating the root with two registers available. Three
cases arise depending on which instruction is used to compute the root and in what order the
left and right subtrees of the root are evaluated.

Dynamic programming techniques have been used in a number of compilers, including the
second version of the portable C compiler, PCC2. The technique facilitates retargeting
because of the applicability of the dynamic programming technique to a broad class of
machines.

Page 22 of 22

You might also like