CD Unit-5

UNIT-5 CODE OPTIMIZATION
5.1 Principal Sources of Optimization

5.2 Peep-hole optimization
5.3 Optimization of Basic Blocks
5.3.1 DAG
5.4 Global Data Flow Analysis
5.5 Efficient Data Flow Algorithms
Code Optimization: Introduction

• Optimization is a program transformation technique, which tries to improve
the code by making it consume less resources and deliver high speed.
• A code optimizing process must follow the three rules given below:
- The output code must not, in any way, change the meaning of the
program.
- Optimization should increase the speed of the program and if
possible, the program should demand less number of resources.
- Optimization should itself be fast and should not delay the overall
compiling process.
Types of Optimizations
Machine independent optimizations:
Machine independent optimizations are program transformations
that improve the target code without taking into consideration any properties of the
target machine.
Machine dependant optimizations:
Machine dependant optimizations are based on register allocation and
utilization of special machine-instruction sequences.
• We will consider only Machine-Independent Optimizations—i.e., they don’t
take into consideration any properties of the target machine.
• The techniques used are a combination of Control-Flow and Data-Flow
analysis.
• - Control-Flow Analysis: Identifies loops in the flow graph of a program
since such loops are usually good candidates for improvement.
• - Data-Flow Analysis: Collects information about the way variables are
used in a program.
Properties of Optimizing Compilers:
The source code should produce correct target code
Dead code should be completely removed from source language.
While applying the optimizing transformations the semantic of the source
program should not be changed.
Basic Blocks
A basic block is a sequence of consecutive statements in which flow of
control enters at the beginning and leaves at the end without any halt or
possibility of branching except at the end.
The following sequence of three-address statements forms a basic block:
t1 : = a * a
t2 : = a * b
t3 : = 2 * t2
t4 : = t1 + t3
t5 : = b * b
t6 : = t4 + t5
Basic Block Construction

Algorithm: Partition into basic blocks
Input: A sequence of three-address statements
Output: A list of basic blocks with each three-address statement in exactly one
block
Method:
1. We first determine the set of leaders, i.e., the first statements of basic blocks.
The rules we use are of the following:
a. The first statement is a leader.
b. Any statement that is the target of a conditional or unconditional goto is a
leader.
c. Any statement that immediately follows a goto or conditional goto
statement is a leader.
2. For each leader, its basic block consists of the leader and all statements up to but
not including the next leader or the end of the program.
Example
begin
prod :=0;
i:=1;
do begin
prod :=prod+ a[i] * b[i];
i :=i+1;
end
while i <= 20
end
Three Address Code:

(1) prod := 0
(2) i := 1
(3) t1 := 4* i
(4) t2 := a[t1] /*compute a[i] */
(5) t3 := 4* i
(6) t4 := b[t3] /*compute b[i] */
(7) t5 := t2*t4
(8) t6 := prod+t5
(9) prod := t6
(10) t7 := i+1
(11) i := t7
(12) if i<=20 goto (3)
Basic block 1: Statement (1) to (2)
Basic block 2: Statement (3) to (12)
Flow Graphs
Flow graph is a directed graph containing the flow-of-control information
for the set of basic blocks making up a program.
The nodes of the flow graph are basic blocks. It has a distinguished initial
node.
• B1 is the initial node. B2 immediately follows B1, so there is an edge from

B1 to B2. The target of jump from last statement of B2 is the first
statement B2, so there is an edge from B2 (last statement) to B2 (first
statement).
• B1 is the predecessor of B2, and B2 is a successor of B1.
5.1 PRINCIPAL SOURCES OF OPTIMIZATION
• A compiler optimization must preserve the semantics of the original
program.
• Except in very special circumstances, once a programmer chooses and
implements a particular algorithm, the compiler cannot understand enough
about the program to replace it with a substantially different and more
efficient algorithm.
• A compiler knows only how to apply relatively low-level semantic
transformations, using general facts such as algebraic identities like i + 0 =
i.
5.1.1 Causes of Redundancy
• There are many redundant operations in a typical program. Sometimes the
redundancy is available at the source level.
• For instance, a programmer may find it more direct and convenient to
recalculate some result, leaving it to the compiler to recognize that only one
such calculation is necessary. But more often, the redundancy is a side
effect of having written the program in a high-level language.
• As a program is compiled, each of these high level data structure accesses
expands into a number of low-level pointer arithmetic operations, such
as the computation of the location of the (i, j)th element of a matrix A.
• Accesses to the same data structure often share many common low-level
operations. Programmers are not aware of these low-level operations and
cannot eliminate the redundancies themselves.
5.1.2 A Running Example: Quicksort
Consider a fragment of a sorting program called quicksort to illustrate several
important code improving transformations. The C program for quicksort is given
below
void quicksort(int m, int n)
/* recursively sorts a[m] through a[n] */
{
int i, j;
int v, x;
if (n <= m) return;
/* fragment begins here */
i=m-1; j=n; v=a[n];
while(1) {
do i=i+1; while (a[i] < v); do
j = j-1; while (a[j] > v); i f (i
>= j) break;
x=a[i]; a[i]=a[j]; a[j]=x; /* swap a[i], a[j] */
}
x=a[i]; a[i]=a[n]; a[n]=x; /* swap a[i], a[n] */
/* fragment ends here */
quicksort (m, j); quicksort (i+1, n) ;
}
Fig 5.1: C code for quicksort
Intermediate code for the marked fragment of the program in Figure 5.1 is shown
in Figure 5.2. In this example we assume that integers occupy four bytes. The
assignment x = a[i] is translated into the two three address statements t6=4*i and
x=a[t6] as shown in steps (14) and (15) of Figure. 5.2. Similarly, a[j] = x becomes
t10=4*j and a[t10]=x in steps (20) and (21).
Fig 5.2: Three-address code for fragment in Figure.5.1
Fig 5.3: Is the flow graph for the program in Figure 5.1.
Figure 5.3 is the flow graph for the program in Figure 5.2. Block B1 is the entry
node. All conditional and unconditional jumps to statements in Figure 5.2 have
been replaced in Figure 5.3 by jumps to the block of which the statements are
leaders. In Figure 5.3, there are three loops. Blocks B2 and B3 are loops by
themselves. Blocks B2, B3, B4, and B5 together form a loop, with B2 the only
entry point.
5.1.3 Semantics-Preserving Transformations
There are a number of ways in which a compiler can improve a program without
changing the function it computes. Common subexpression elimination, copy
propagation, dead-code elimination, and constant folding are common examples of
such function-preserving (or semantics preserving) transformations.
(a) Before (b)After

Fig 5.4: Local Common-Subexpression Elimination
Some of these duplicate calculations cannot be avoided by the programmer
because they lie below the level of detail accessible within the source language.
For example, block B5 shown in Figure 5.4(a) recalculates 4 * i and 4 *j,
although none of these calculations were requested explicitly by the programmer.
5.1.4 Global Common Subexpressions
An occurrence of an expression E is called a common subexpression if E was
previously computed and the values of the variables in E have not changed since
the previous computation. We avoid re-computing E if we can use its previously
computed value; that is, the variable x to which the previous computation of
E was assigned has not changed in the interim.
The assignments to t7 and t10 in Figure 5.4(a) compute the common
subexpressions 4 * i and 4 * j, respectively. These steps have been eliminated in
Figure 5.4(b), which uses t6 instead of t7 and t8 instead of t10.
Figure 9.5 shows the result of eliminating both global and local common
subexpressions from blocks B5 and B6 in the flow graph of Figure 5.3. We first
discuss the transformation of B5 and then mention some subtleties involving
arrays.
After local common subexpressions are eliminated, B5 still evaluates 4*i and
4 * j, as shown in Figure 5.4(b). Both are common subexpressions; in
particular, the three statements
t8=4*j
t9=a[t8]
a[t8]=x
in B5 can be replaced by
t9=a[t4]
a[t4]=x
using t4 computed in block B3. In Figure 5.5, observe that as control passes from
the evaluation of 4 * j in B3 to B3, there is no change to j and no change to t4, so t4
can be used if 4 * j is needed.
Another common subexpression comes to light in B5 after t4 replaces t8. The new
expression a[t4] corresponds to the value of a[j] at the source level. Not only does j
retain its value as control leaves B3 and then enters B5, but a[j], a value
computed into a temporary t5, does too, because there are no assignments to
elements of the array a in the interim. The statements
t9=a[t4]
a[t6]=t9
in B5 therefore can be replaced by
a[t6]=t5
Analogously, the value assigned to x in block B5 of Figure 5.4(b) is seen to be
the same as the value assigned to t3 in block B2. Block B5 in Figure 5.5 is the result
of eliminating common subexpressions corresponding to the values of the source
level expressions a[i] and a[j] from B5 in Figure 5.4(b). A similar series of
transformations has been done to B6 in Figure 5.5.
The expression a[tl] in blocks B1 and B6 of Figure 5.5 is not considered a
common subexpression, although tl can be used in both places. After control
leaves B1 and before it reaches B6, it can go through B5, where there are
assignments to a. Hence, a[tl] may not have the same value on reaching B6 as it
did on leaving B1, and it is not safe to treat a[tl] as a common subexpression.
Fig 5.5: B5 and B6 after common-subexpression elimination

5.1.5 Copy Propagation
Block B5 in Figure 5.5 can be further improved by eliminating x, using two
new transformations. One concerns assignments of the form u = v called copy
statements, or copies for short. Copies would have arisen much sooner, because
the normal algorithm for eliminating common subexpressions introduces them, as
do several other algorithms.
(a) (b)
Fig 5.6: Copies introduced during common subexpression elimination
In order to eliminate the common subexpression from the statement c = d+e in
Figure 5.6(a), we must use a new variable t to hold the value of d + e. The value of
variable t, instead of that of the expression d + e, is assigned to c in Figure 5.6(b).
Since control may reach c = d+e either after the assignment to a or after the
assignment to b, it would be incorrect to replace c = d+e by either c = a or by c =
b.
The idea behind the copy-propagation transformation is to use v for u, wherever
possible after the copy statement u = v. For example, the assignment x = t3 in
block B5 of Figure 5.5 is a copy. Copy propagation applied to B5 yields the code in
Figure 5.7. This change may not appear to be an improvement, but, it gives us the
opportunity to eliminate the assignment to x.
Fig 5.7: Basic block B5 after copy propagation

5.1.6 Dead-Code Elimination
A variable is live at a point in a program if its value can be used
subsequently; otherwise, it is dead at that point. A related idea is dead (or
useless) code - statements that compute values that never get used. While the
programmer is unlikely to introduce any dead code intentionally, it may appear as
the result of previous transformations.
Deducing at compile time that the value of an expression is a constant and
using the constant instead is known as constant folding.
One advantage of copy propagation is that it often turns the copy statement
into dead code. For example, copy propagation followed by dead-code elimination
removes the assignment to x and transforms the code in Figure 5.7 into
This code is a further improvement of block B5 in Figure 5.5.

5.1.7 Code Motion
Loops are a very important place for optimizations, especially the inner
loops where programs tend to spend the bulk of their time. The running time of a
program may be improved if we decrease the number of instructions in an inner
loop, even if we increase the amount of code outside that loop.
An important modification that decreases the amount of code in a loop is
code motion. This transformation takes an expression that yields the same
result independent of the number of times a loop is executed (a loop-invariant
computation) and evaluates the expression before the loop.
Evaluation of limit - 2 is a loop-invariant computation in the following while
statement :
while (i <= limit-2) /* statement does not change limit */
Code motion will result in the equivalent code
t = limit-2
while ( i <= t ) /* statement does not change limit or t */
Now, the computation of limit - 2 is performed once, before we enter the loop.
Previously, there would be n + 1 calculations of limit - 2 if we iterated the body of the
loop n times.
5.1.8 Induction Variables and Reduction in Strength
Another important optimization is to find induction variables in loops and
optimize their computation. A variable x is said to be an "induction variable" if
there is a positive or negative constant c such that each time x is assigned, its
value increases by c. For instance, i and t2 are induction variables in the loop
containing B2 of Figure 5.5. Induction variables can be computed with a single
increment (addition or subtraction) per loop iteration.
Replacement of a computation with a less expensive one.
Thus, we shall see how this optimization applies to our quicksort example by
beginning with one of the innermost loops: B3 by itself. Note that the values of j and
t4 remain in lock step; every time the value of j decreases by 1, the value of t4
decreases by 4, because 4 * j is assigned to t4. These variables, j and t4, thus form
a good example of a pair of induction variables.
When there are two or more induction variables in a loop, it may be possible to get
rid of all but one. For the inner loop of B3 in Fig. 5.8, we cannot get rid of either j
or t4 completely; t4 is used in B3 and j is used in B4. However, we can illustrate
reduction in strength and a part of the process of induction-variable elimination.
Eventually, j will be eliminated when the outer loop consisting of blocks B2, B3, B4
and Bs is considered.
Example.
Consider the assignment t4=4*j in Block B3
j is decremented by 1 each time, then t4=4*j -4 Thus, we may replace
t4=4*j by t4=t4 -4
Problem: We need to initialize t4 to t4 =4 *j before entering the Block B3.
Result. The substitution of a multiplication by a subtraction will speed up
the resulting code.
B2: B3:
i=i+1 j = j -1
t2=4*i t4 = 4*j
Fig 5.8: Strength reduction applied to 4 * j in block B3
After reduction in strength is applied to the inner loops around B2 and B3, the
only use of i and j is to determine the outcome of the test in block B4. We know
that the values of i and t2 satisfy the relationship t2 = 4 * i, while those of j and
t4 satisfy the relationship t4 = 4* j. Thus, the test t2 >= t4 can substitute for i >=
j. Once this replacement is made, i in block B2 and j in block B3 become dead
variables, and the assignments to them in these blocks become dead code that can
be eliminated. The resulting flow graph is shown in Figure. 5.9.
Fig 5.9: Flow graph after induction-variable elimination
Note:
1. Code motion, induction variable elimination and strength reduction are loop
optimization techniques.
2. Common sub expression elimination, copy propogation dead code elimination
and constant folding are function preserving transformations.
5.2 PEEPHOLE OPTIMIZATION

Most production compilers produce good code through careful instruction
selection and register allocation, a few use an alternative strategy: they generate
naive code and then improve the quality of the target code by applying
"optimizing" transformations to the target program.
The term "optimizing" is many simple transformations can significantly
improve the running time or space requirement of the target program.
A simple but effective technique for locally improving the target code
is peephole optimization, which is done by examining a sliding window of
target instructions (called the peephole) and replacing instruction
sequences within the peephole by a shorter or faster sequence, whenever
possible.
Peephole optimization can also be applied directly after intermediate code
generation to improve the intermediate representation.
The peephole is a small, sliding window on a program. The code in the peephole
need not be contiguous, although some implementations do require this.
Characteristic of peephole optimization:
• Each improvement may spawn opportunities for additional improvements.
• Repeated passes over the target code are necessary to get the maximum
benefit.
Examples of program transformations that are characteristic of peephole
optimizations:
• Redundant-instruction elimination
• Flow-of-control optimizations
• Algebraic simplifications
• Use of machine idioms
5.2.1 Eliminating Redundant Loads and Stores
If we see the instruction sequence in a target program,
LD a , R0
ST R0 , a
we can delete store instructions because whenever it is executed. First
instruction will ensure that the value of a has already been loaded into register
R0.
Note that if the store instruction had a label we could not be sure that first
instruction was always executed immediately before the second and so we could
not remove the store instruction.
Put another way, the two instructions have to be in the same basic block for this
transformation to be safe.
5.2.2 Eliminating Unreachable Code
Another opportunity for peephole optimization is the removal of unreachable
instructions. An unlabeled instruction immediately following an unconditional
jump may be removed. This operation can be repeated to eliminate a sequence of
instructions.
For example, for debugging purposes, a large program may have within it certain
code fragments that are executed only if a variable debug is equal to 1.
In C, the source code might look like:
#define debug 0
….
if ( debug ) {
Print debugging information
}
In the intermediate representation, this code may look like
if debug == 1 goto L1
goto L2
L 1 : print debugging information
L2 :
One obvious peephole optimization is to eliminate jumps over jumps. Thus, no
matter what the value of debug, the code sequence above can be replaced by
if debug != 1 goto L2
print debugging information
L2 :
If debug is set to 0 at the beginning of the program, constant propagation would
transform this sequence into
if 0 != 1 goto L2
print debugging information
L2:
Now the argument of the first statement always evaluates to true, so the
statement can be replaced by goto L2. Then all statements that print debugging
information are unreachable and can be eliminated one at a time.
5.2.3 Flow-of-Control Optimizations
Simple intermediate code-generation algorithms frequently produce jumps to
jumps, jumps to conditional jumps, or conditional jumps to jumps. These
unnecessary jumps can be eliminated in either the intermediate code or the
target code by the following types of peephole optimizations. We can replace the
sequence
goto L1
L1 : goto L2
by the sequence
goto L2
L1 : goto L2
If there are now no jumps to L1, then it may be possible to eliminate the
statement L1 : goto L2 provided it is preceded by an unconditional jump.
Similarly, the sequence
if a < b goto L1
L1 : goto L2
can be replaced by the sequence
if a < b goto L2
L1 : goto L2
Finally, suppose there is only one jump to L1 and L1 is preceded by an
unconditional goto. Then the sequence goto L1
L1 : if a < b goto L2
L3 :
may be replaced by the sequence
L3 :
if a < b goto L2
goto L3
While the number of instructions in the two sequences is the same, we sometimes
skip the unconditional jump in the second sequence, but never in the first. Thus, the
second sequence is superior to the first in execution time.
5.2.4 Algebraic Simplification and Reduction in Strength
These algebraic identities can also be used by a peephole optimizer to eliminate
three-address statements such as
x=x+0
or
x= x * 1 in the peephole.
Similarly, reduction-in-strength transformations can be applied in the
peephole to replace expensive operations by equivalent cheaper ones on the
target machine. Certain machine instructions are considerably cheaper than
others and can often be used as special cases of more expensive operators.
For example, x^ 2 is invariably cheaper to implement as x * x than as a call to an
exponentiation routine.
• Fixed-point multiplication or division by a power of two is cheaper to
implement as a shift.
• Floating-point division by a constant can be approximated as multiplication
by a constant, which may be cheaper. i.e. x / 2 = x * 0.5
5.2.5 Use of Machine Idioms
• The target machine may have hardware instructions to implement certain
specific operations efficiently.
• Detecting situations that permit the use of these instructions can reduce
execution time significantly.
For example, some machines have auto-increment and auto-decrement
addressing modes. These add or subtract one from an operand before or after
using its value.
The use of the modes greatly improves the quality of code when pushing or
popping a stack, as in parameter passing. These modes can also be used in code for
statements like x = x + 1.
x = x + 1 → x++
x = x - 1 → x- -
5.3 OPTIMIZATION OF BASIC BLOCKS

We can often obtain a substantial improvement in the running time of code
merely by performing local optimization within each basic block by itself.
5.3.1 DIRECTED ACYCLIC GRAPHS (DAG)
Like the syntax tree for an expression, a DAG has leaves corresponding to atomic
operands and interior codes corresponding to operators. The difference is that a
node N in a DAG has more than one parent if N represents a common
subexpression; in a syntax tree, the tree for the common subexpression would be
replicated as many times as the subexpression appears in the original expression.
Thus, a DAG not only represents expressions more succinctly, it gives the
compiler important clues regarding the generation of efficient code to evaluate the
expressions.
Example: The DAG for the expression a+a*(b-c)+(b-c)*d by sequence of steps The
leaf for “a” has two parents, because a appears twice in the expression. More
interestingly, the two occurrences of the common subexpression b-c are
represented by one node, the node labeled “-“. That node has two parents,
representing its two uses in the subexpressions a*(b-c) and (b-c)*d. Even though b
and c appear twice in the complete expression, their nodes each have one
parent, since both uses are in the common subexpression b-c.
Fig 5.10: Dag for the expression a + a * (b - c) + (b - c) * d

Table 5.1: Syntax-directed definition to produce syntax trees or DAG's
S. No PRODUCTION SEMANTIC RULES
1 E→E1 + T E.node = new Node('+', El .node, T.node)
2 E → E1 – T E.node = new Node('-', El .node, T.node)
3 E→T E.node = T.node
4 T→(E) E.node = T.node
5 T → id T.node = new Leaf (id, id. entry)
6 T→ num T. node = new Leaf (num, num. val)
The Syntax-directed definition (SDD) of Figure 5.10 can construct either syntax
trees or DAG's. It was used to construct syntax trees in Example 5.10, where
functions Leaf and Node created a fresh node each time they were called. It will
construct a DAG if, before creating a new node, these functions first check
whether an identical node already exists. If a previously created identical node
exists, the existing node is returned. For instance, before constructing a new
node, Node(op, left, right) we check whether there is already a node with label op,
and children left and right, in that order. If so, Node returns the existing node;
otherwise, it creates a new node.
1) pl = Leaf(id, entry-a)
2) p2 = Leaf(id, entry-a) = p1
3) p3 = Leaf(id, entry-b)
4) p4 = Leaf(id, entry-c)
5) p5 = Node('-', p3, p4)
6) p6 = Node('*', pl p5)
7) p7 = Node(' f ' p1, p6)
8) p8 = Leaf(id, entry-b) = p3
9) p9 = Leaf(id, entry-c) = p4
10) pl0 = Node('-', p3, p4) = p5
11) p11 = Leaf(id, entry-d)
12) p12 = Node('*', p5, p11)
13) p13 = Node('+', p7, pl2)
Assume that entry-a points to the symbol-table entry for a, and similarly for the
other identifiers. When the call to Leaf (id, entry-a) is repeated at step 2, the node
created by the previous call is returned, so p2 = pl. Similarly, the nodes returned at
steps 8 and 9 are the same as those returned at steps 3 and 4 (i.e., p8 = p3 and p9
= p4). Hence the node returned at step 10 must be the same at that returned at
step 5; i.e., p10 = p5.
The DAG Representation of Basic Blocks
Many important techniques for local optimization begin by transforming a basic
block into a DAG (directed acyclic graph). The idea extends naturally to the
collection of expressions that are created within one basic block. We construct a
DAG for a basic block as follows:
1. There is a node in the DAG for each of the initial values of the variables
appearing in the basic block.
2. There is a node N associated with each statement s within the block. The
children of N are those nodes corresponding to statements that are the last
definitions, prior to s, of the operands used by s.
3. Node N is labeled by the operator applied at s, and also attached to N is the list of
variables for which it is the last definition within the block.
4. Certain nodes are designated output nodes. These are the nodes whose
variables are live on exit from the block; that is, their values may be used later, in
another block of the flow graph. Calculation of these "live variables" is a matter
for global flow analysis, The DAG representation of a basic block lets us perform
several code improving transformations on the code represented by the block.
a) We can eliminate local common subexpressions, that is, instructions that
compute a value that has already been computed.
b) We can eliminate dead code, that is, instructions that compute a value that is
never used.
c) We can reorder statements that do not depend on one another; such reordering
may reduce the time a temporary value needs to be preserved in a register.
d) We can apply algebraic laws to reorder operands of three-address instructions,
and sometimes thereby simplify the computation.
Example: Construct DAG from the basic block.
1 t1 = 4*i
2 t2 = a[t1]
3 t3 = 4*i
4 t4 = b[t3]
5 t5 = t2*t4
6 t6 = prod + t5
7 t7 = i+1
8 i = t7
9 if i<=20 goto 1
Fig 5.11: Step by step construction of DAG
5.3.2 Finding Local Common Subexpressions
Common subexpressions can be detected by noticing, as a new node M is about to
be added, whether there is an existing node N with the same children, in the same
order, and with the same operator. If so, N computes the same value as M and may
be used in its place.
Example : A DAG for the block a
=b+c
b = a - d
c=b+c
d = a - d is shown in Figure 5.12. When we construct the node for the third
statement c = b + c, we know that the use of b in b + c refers to the node of Figure
5.12 labeled -, because that is the most recent definition of b. Thus, we do not
confuse the values computed at statements one and three.
Fig 5.12: DAG for basic block

However, the node corresponding to the fourth statement d = a - d has the
operator - and the nodes with attached variables a and d 0 as children. Since the
operator and the children are the same as those for the node corresponding to
statement two, we do not create this node, but add d to the list of definitions for
the node labeled -.
If b is not live on exit from the block, then we do not need to compute that
variable, and can use d to receive the value represented by the node labeled -.
a=b+c
d = a - d
c=d+c
However, if both b and d are live on exit, then a fourth statement must be used to
copy the value from one to the other.'
Example: When we look for common subexpressions, we really are looking for
expressions that are guaranteed to compute the same value, no matter how that
value is computed. Thus, the DAG method will miss the fact that the expression
computed by the first and fourth statements in the sequence
a=b+c
b = b - d
c = c + d
e=b+c
is the same, namely b 0 + c0. That is, even though b and c both change between the
first and last statements, their sum remains the same, because b + c = (b - d) + (c +
d). The DAG for this sequence is shown in Fig. 5.13, but does not exhibit any
common subexpressions.
Fig 5.13: DAG for basic block

5.3.3 Dead Code Elimination
The operation on DAG's that corresponds to dead-code elimination can be
implemented as follows. We delete from a DAG any root (node with no ancestors)
that has no live variables attached. Repeated application of this transformation
will remove all nodes from the DAG that correspond to dead code.
Example : If, in Fig. 5.13, a and b are live but c and e are not, we can
immediately remove the root labeled e. Then, the node labeled c becomes a root
and can be removed. The roots labeled a and b remain, since they each have live
variables attached.
Fig 5.14: DAG after Dead Code Elimination

5.3.4 The Use of Algebraic Identities
Algebraic identities represent another important class of optimizations on basic
blocks. For example, we may apply arithmetic identities, such as to eliminate
computations from a basic block.
Another class of algebraic optimizations includes local reduction in strength, that is,
replacing a more expensive operator by a cheaper one as in:
A third class of related optimizations is constant folding. Here we evaluate

constant expressions at compile time and replace the constant expressions by
their value. Thus the expression 2 * 3.14 would be replaced by 6.28. Many
constant expressions arise in practice because of the frequent use of symbolic
constants in programs.
5.3.5 Representation of Array References
Consider for instance the sequence of three address statements:
The above code can be "optimized" by replacing the third instruction z = a[i] by the
simpler z =x. However, the first statement cannot be optimized.
The proper way to represent array accesses in a DAG is as follows.
1. An assignment from an array, like x = a [i], is represented by creating a node with
operator =[ ] and two children representing the initial value of the array, a 0 in this
case, and the index i. Variable x becomes a label of this new node.
2. An assignment to an array, like a[jl = y, is represented by a new node with
operator [ ] = and three children representing a 0, j and y. There is no variable
labeling this node. What is different is that the creation of this node kzlls all
currently constructed nodes whose value depends on a0. A node that has been
killed cannot receive any more labels; that is, it cannot become a common
subexpression.
Example : The DAG for the basic bloc
x= a[i]
a[j]=y
z=a[i]
The node N for x is created first, but when the node labeled [ ] = is created, N is
killed.
Thus, when the node for z is created, it cannot be identified with N, and a new
node with the same operands a0 and i0 must be created instead.
Fig 5.15: The DAG for a sequence of array assignments
Example : Sometimes, a node must be killed even though none of its children have
an array like a0 in previous example as attached variable. Likewise, a node can kill
if it has a descendant that is an array, even though none of its children are array
nodes. For instance, consider the three-address code
b = 12 + a
x = b[i]
b[j] = y
What is happening here is that, for efficiency reasons, b has been defined to be a
position in an array a. For example, if the elements of a are four bytes long, then b
represents the fourth element of a. If j and i represent the same value, then b [i] and
b[j] represent the same location.
Therefore it is important to have the third instruction, b[j] = y, kill the node with x as
its attached variable.
Fig 5.16: A node that kills a use of an array need not have that array as a
child
However, as we see in Fig. 5.16, both the killed node and the node that does the
killing have a0 as a grandchild, not as a child.
5.3.6 Pointer Assignments and Procedure Calls
When we assign indirectly through a pointer, as in the assignments
x=*p
*q=y
we do not know what p or q point to. In effect, x = *p is a use of every variable
whatsoever, and *q = y is a possible assignment to every variable. As a
consequence, the operator =* must take all nodes that are currently associated
with identifiers as arguments, which is relevant for dead-code elimination.
More importantly, the *= operator kills all other nodes so far constructed in the
DAG.
There are global pointer analyses one could perform that might limit the set of
variables a pointer could reference at a given place in the code. Even local
analysis could restrict the scope of a pointer. For instance, in the sequence
p=&x
*p=y
we know that x, and no other variable, is given the value of y, so we don't need to kill
any node but the node to which x was attached.
Procedure calls behave much like assignments through pointers. In the absence
of global data-flow information, we must assume that a procedure uses and
changes any data to which it has access.Thus, if variable x is in the scope of a
procedure P, a call to P both uses the node with attached variable x and kills that
node.
5.3.7 Reassembling Basic Blocks from DAG's
After we perform whatever optimizations are possible while constructing the DAG or
by manipulating the DAG once constructed, we may reconstitute the three-
address code for the basic block from which we built the DAG. For each node that
has one or more attached variables, we construct a three-address statement that
computes the value of one of those variables. We prefer to compute the result into a
variable that is live on exit from the block. However, if we do not have global
live-variable information to work from, we need to assume that every variable of the
program (but not temporaries that are generated by the compiler to process
expressions) is live on exit from the block.
If the node has more than one live variable attached, then we have to introduce
copy statements to give the correct value to each of those variables. Sometimes,
global optimization can eliminate those copies, if we can arrange to use one of two
variables in place of the other.
Example: If b is not live on exit from the block, then the three statements
a=b+c
d = a - d
c=d+c
suffice to reconstruct the basic block. The third instruction, c = d + c, must use d
as an operand rather than b, because the optimized block never computes b.
If both b and d are live on exit, or if we are not sure whether or not they are live
on exit, then we need to compute b as well as d. We can do so with the sequence
a=b+c
d = a - d
b=d
c=d+c
This basic block is still more efficient than the original. Although the number of
instructions is the same, we have replaced a subtraction by a copy, which tends
to be less expensive on most machines. Further, it may be that by doing a global
analysis, we can eliminate the use of this computation of b outside the block by
replacing it by uses of d. In that case, we can come back to this basic block and
eliminate b = d later. Intuitively, we can eliminate this copy if wherever this value of
b is used, d is still holding the same value. That situation may or may not be true,
depending on how the program recomputes d.
5.4 Global data-flow analysis

Data-flow information can be collected by setting up and solving systems of
equations that relate information at various points in a program. A typical
equation has the form
out[S] = gen[S] U (in[S] - kill[S]) -----------(5.1)
and can be read as, "The information at the end of a statement is either generated
within the statement, or enters at the beginning and is not killed as control flows
through the statement." Such equations are called data-flow equations.
The details of how data-flow equations are set up and solved depend on three
factors.
1. The notions of generating and killing depend on the desired information.
i.e., on the data-flow analysis problem to be solved. Moreover, for some problems,
instead of proceeding along with the flow of control and defining out[S] in terms of
in[S], we need to proceed backwards and define in[S] in terms of out[S].
2. Since data flows along control paths, data-flow analysis is affected by the
control constructs in a program. In fact, when we write out[S] we implicitly
assume that there is unique end point where control leaves the statement.
3. There are subtleties that go along with such statements as procedure calls,
assignments through pointer variables, and even assignments to array variables.
Points and Paths
Within a basic block, we talk of the point between two adjacent statements, as
well as the point before the first statement and after the last. Thus, block B 1 in
Fig.5.14 has four points: one before any of the assignments and one after each of the
three assignments.
Fig.5.17 Flow Graph

Now, let us take a global view and consider all the points in all the blocks. A
path from P 1 to Pn is a sequence of points P 1, P2, . . . , Pn such that for each i
between 1 and n-1, either
1. Pi is the point immediately preceding a statement and Pi+1 is the point
immediately following that statement in the same block, or
2. Pi is the end of some block and Pi+1 is the beginning of a successor block.
Reaching Definitions
A definition of a variable x is a statement that assigns, or may assign, a

value to x. The most common forms of definition are assignments to x and
statements that read a value from an I/O device and store it in x. These
statements certainly define a value for x, and they are referred to as
unambiguous definitions of x. There are certain other kinds of statements that may
define a value for x; they are called ambiguous definitions. The most usual forms of
ambiguous definitions or x are:
1. A call of a procedure with x as a parameter or a procedure that can access x

because x is in the scope of the procedure. We also have to consider the
possibility of "aliasing;” where x is not in the scope of the procedure, but x has
been identified with another variable that is passed as a parameter or is in the
scope.
2. An assignment through a pointer that could refer to x. For example, the

assignment *q:=y is a definition of x if it is possible that q points to x.
We say a definition d reaches a point p if there is a path from the point
immediately following d to p, such that d is not "killed" along that path.
Data-Flow Analysis of Structured Programs
Consider the production
S -> id :=E | S;S | if E then S else S | do S while E E
-> id + id | id
We define a portion or a flow graph called a region to be a set of nodes N

that includes a header, which dominates all other nodes in the region. All edges
between nodes in N are in the region, except for some that enter the header.
Fig.5.18: Data-flow equations for reaching definitions
Computation of in and out
The in is an inherited attribute and out is a synthesized attribute

depending on in. We intend that in[S] be the set of definitions reaching the
beginning of S, taking into account the flow of control throughout the entire
program, including statements outside of S or within which S is nested. The set
out[S] is defined similarly for the end of s.
Representation of Sets
Sets of definitions, such as gen[S] and kill[S], can be represented compactly

using bit vectors. We assign a number to each definition of interest in the flow
graph. Then the bit vector representing a set of definitions will have 1 in position i if
and only if the definition numbered i is in the set.
The number of a definition statement can be taken as the index of the

statement in an array holding pointers to statements. However, not all definitions
may be of interest during global data-flow analysis. For example, definitions of
temporaries that are used only within a single block. Therefore, the numbers of
definitions of interest will typically be recorded in a separate table.
A bit-vector representation for sets also allows set operations to be

implemented efficiently. The union and intersection of two sets can be
implemented by logical or and logical and, respectively, basic operations in most
systems-oriented programming languages. The difference A-B of sets A and B can
be implemented by taking the complement of B and then using logical and to
compute A^ ¬ B.
Example: Consider the program for reaching definitions.
The gen and kill sets for the above program.
Fig.5.19: The gen and kill sets at nodes in a syntax tree

Consider the node for d7 in the bottom-right corner of Fig.5.16. The gen set {d7}
is represented by 000 000l and the kill set {d 1,d4} is 100 1000. That is, D7 kills all
the other definitions of i, its variable.
The second and third children of the if node represent the then and else
parts, respectively. Note that the gen set 0000011 at the if node is the union of
the sets 000 0010 and 000 0001 at the second and third children. The kill set is
empty because the definitions killed by the then and else parts are disjoint.
Data-flow equations for a cascade of statements are applied at the parent of

the if node. The kill set at this is obtained by
000 0000 U ( 110 0001 -000 0011) = 110 0000
Let us compute in and out, starting from the top of the parse tree. We
assume the in set at the root of the syntax tree is empty. Thus out for the left child
of the root is gen of that node, or 111 0000. This is also the value of the in set at
the do node. From the data-flow equations associated with the do production
in Fig.5.19, the in set for the statement inside the do loop is obtained by taking the
union of the in set 111 0000 at the do node and the gen set 000 1111 at the
statement. The union yields 111 1111, so all definitions can reach the beginning of
the loop body. However, at the point just before definition d 5, the in set is 011 1110,
since definition d4 kills d1 and d7.
Use Definition Chains
It is often convenient to store the reaching definition information as "use-

definition chains" or "ud-chains," which are lists, for each use of a variable, of all
the definitions that reach that use. If a use of variable a in block B is preceded by no
unambiguous definition of a, then the ud-chain for that use of a is the set of
definitions in in[B] that are definitions of a. If there are unambiguous definitions of
a within B preceding this use of a, then only the last such definition of a will be on the
ud-chain, and in[B] is not placed on the ud-chain.
5.5 Efficient Data Flow Algorithms

Data-flow analysis speed can be increased by the following two algorithms
1. Depth-First Ordering in iterative Algorithms:
2. Structure-based Data-Flow Analysis.
The first is an application of depth-first ordering to reduce the number of
'passes that the iterative algorithm takes, and the second uses intervals or the
T1and T2 transformations to generalize the syntax-directed approach.
1.Efficient Ordering in Iterative Algorithms / Depth-First Ordering in

iterative Algorithms
• Reaching definitions. Available expressions, or live variables, any event of
significance at a node will be propagated to that node along an acyclic path.
• Iterative algorithms can be used to track their acyclic nature.
• For example, if a definition d is in in[B], then there is some acyclic path
from the block containing d to B such that d is in the in’s and out’s all
along that path. Similarly, if an expression x+y is not available at the
entrance to block B, then there is some acyclic path that demonstrates that
fact; either the path is from initial node and includes no statement that
kills or generates x+y, or the path is from a block that kills x+y and along the
path there is no subsequent generation of x+y.
• Finally, for live variables, if x is live on exit from block B, then there is an
acyclic path from B to a use of x, along which there are no definitions of x.
• We need to check in each of these cases, paths with cycles add nothing. For
example, if a use of x is reached from the end of block B along a path with
a cycle , we can eliminate that cycle to find shorter path along which the
use of x is still reached from B.
/* initialize out on the assumption in[B] = ∅ for all B */

(1) for each block B do out[B] := gen[B];
(2) change := true; /* to get the while- loop going */
(3) while change do begin
(4) change := false;
(5) for each block B do begin
(6) in[B] := U out[P];
P a predecessor of B
(7) oldout := out[B];
(8) out[B] := gen[B] U (in[B] - Kill[B]);
(9) if out[B] ≠ oldout then change :=true
(10) end
(11) end
Algorithm to Compute in and out
Procedure
1. First visit the root node of the tree. Eg. (1)
2. If no root node present, then visit the first right hand side node. Eg. (1)
3. After reaching depth visit the missed node by visiting their parent node.
Figu 5.20: Depth first traversal for the given tree.
Any event of significance at a node will be provided to that node along an acyclic
path.
The order of visiting the edges in the above tree is:
1 →3 → 4 → 6 → 7 → 8 →10 → 8 → 9 → 8 → 7 → 6 → 4 → 5 →4 →3 → 1→
2→1
Steps:
• After node 4, there is confusion, either 5 or 6, we considered 6.
• After visiting node 10, back tract to 8 to visit 9.
• The definition d from Out[1] will reach In[3] and Out[3] will reach In[4] and
so on.
2. Structure Based Data Flow Analysis
We can implement data-flow algorithms that visit nodes no more times than
the interval depth of the flow graph. The ideas exposed here apply to syntax-
directed data-flow algorithms for all sorts of structured control statements.
This algorithm focus on multiple exists in the blocks.
• gen R, B indicates the definition that was generated in the region R of the
basic block B.
• kill R, B indicates the definition that was killed in the region R of the basic
block B.
• The transfer function trans R, B (S) of definition set S is set of definitions
reach the end of block B by traveling along paths wholly within R.
• The definitions reaching the end of block B fall into two classes.
1. Those that are generated within R and propagate to the end of B
independent of S.
2. Those that are not generated in R, but that also are not killed along some
path from the header of R to the end of B, and therefore are in trans R, B (S) if and only
if they are in S.
Thus, we may write trans in the form:
trans R, B (S) = gen R,B ⋃ (S - kill R,B)
Case 1:
If the transformation does not alter any definition I the basic block B, then the
transfer function of region R, is same as the transfer function of Block B.
gen B, B = gen[B]
kill B, B = kill[B]
Case 2:
The region R is formed when R1 consumes R2. There are no edges from R2
to R1.Header of R is the header of R1. The R2 does not affect the transfer function
of R1. gen R, B = gen R1, B
kill R, B = kill R1, B for all B in R1.
Fig 5.21 : Region building by T2
For B in R2, a definition can reach the end of B if any of the following conditions
hold:
1. The definition is generated within R2.
2. The definition is generated within R1 reaches the end of some
predecessor of the header of R2, and is not killed going from the header of
R2 to B.
3. The definition is in the set S available at the header of R1, not killed
going to some predecessor of the header of R2, and not killed going from the
header of R2 to B.
Therefore, the transfer function in R for those blocks B in R2 may be computed by,
gen R, B = gen R2, B U (G - kill R2, B)
kill R, B = kill R2, B U (K - gen R2, B)
Algorithm : Structure- based reaching definitions.

Input : A reducible flow graph G and sets of definitions gen[B] and kill[B] for each
block B of G.
Output: in[B] for each block B
Method:
1. Find the (T1, T2) - decomposition for G
2. For each region R in the decomposition, from the inside out, compute
genR,B and killR,B for each block B in R
3. If U is the name of the region consisting of the entire graph, then for each
block B, set in[B] to the union, over all predecessors P of block B, of
gen U, P.
Next, consider what happens when a region R is built from a region R 1

using transformation T1. The general situation is shown in below Fig ; note that R
consists of R1 plus some back edges to the header of R1(which is also the header of
R, of course). A path going through the header twice would be cyclic and, as we
argued earlier in the section, need not be considered. Thus, all definitions
generated at the end of block B are generated in one of two ways.
Now, consider the region R is built from region R1 using transformation T1. A
path going through the header twice would be cyclic, but in this section no need
to consider. Thus, all definitions generated at the end of block B are generated
in one or two ways.
Fig 5.22 : Region building by T1
1. The definition is generated within R 1 and does not need the back edges
incorporated into R in order to reach the end of B.
2. The definition is generated somewhere within R1, reaches a predecessor

of the header, follows one back edge, and is not killed going from the
header to B
If we let G be the union of gen R1, B for all predecessors of the header in R,
then
gen R,B = gen R1, B U ( G- Kill R1, B)
A definition is killed going from the header to B if and only if it is killed along all
acyclic paths, so the back edges incorporated into R do not cause more
definitions to be killed. That is,
kill R, B = kill R1, B

CD Unit-5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CD Unit-5

Uploaded by

Copyright:

Available Formats

UNIT-5 CODE OPTIMIZATION

5.1 Principal Sources of Optimization

Code Optimization: Introduction

Basic Block Construction

Three Address Code:

• B1 is the initial node. B2 immediately follows B1, so there is an edge from

Fig 5.2: Three-address code for fragment in Figure.5.1

(a) Before (b)After

Fig 5.5: B5 and B6 after common-subexpression elimination

Fig 5.7: Basic block B5 after copy propagation

This code is a further improvement of block B5 in Figure 5.5.

5.2 PEEPHOLE OPTIMIZATION

5.3 OPTIMIZATION OF BASIC BLOCKS

Fig 5.10: Dag for the expression a + a * (b - c) + (b - c) * d

Fig 5.12: DAG for basic block

Fig 5.13: DAG for basic block

Fig 5.14: DAG after Dead Code Elimination

A third class of related optimizations is constant folding. Here we evaluate

5.4 Global data-flow analysis

Fig.5.17 Flow Graph

A definition of a variable x is a statement that assigns, or may assign, a

1. A call of a procedure with x as a parameter or a procedure that can access x

2. An assignment through a pointer that could refer to x. For example, the

We define a portion or a flow graph called a region to be a set of nodes N

Computation of in and out

The in is an inherited attribute and out is a synthesized attribute

Sets of definitions, such as gen[S] and kill[S], can be represented compactly

The number of a definition statement can be taken as the index of the

A bit-vector representation for sets also allows set operations to be

Example: Consider the program for reaching definitions.

The gen and kill sets for the above program.

Fig.5.19: The gen and kill sets at nodes in a syntax tree

Data-flow equations for a cascade of statements are applied at the parent of

000 0000 U ( 110 0001 -000 0011) = 110 0000

Use Definition Chains

It is often convenient to store the reaching definition information as "use-

5.5 Efficient Data Flow Algorithms

1.Efficient Ordering in Iterative Algorithms / Depth-First Ordering in

/* initialize out on the assumption in[B] = ∅ for all B */

Figu 5.20: Depth first traversal for the given tree.

Algorithm : Structure- based reaching definitions.

Next, consider what happens when a region R is built from a region R 1

Fig 5.22 : Region building by T1

2. The definition is generated somewhere within R1, reaches a predecessor

gen R,B = gen R1, B U ( G- Kill R1, B)

kill R, B = kill R1, B

You might also like