You are on page 1of 24

byjusexamprep.

com

1
byjusexamprep.com

COMPILER DESIGN

5 CODE OPTIMIZATION

CODE OPTIMIZATION
1. Constant propagation:
In constant propagation, constants assigned to a variable can be propagated (grown)
through the control flow graph and substituted when the variable is used.
Example:
In the given code fragment, the value of x can be propagated to the use of x.
x = 3;
y = x + 4;
Below is the piece of code after constant propagation and constant folding techniques
are performed.
x = 3;
y = 7;
Notes:
Some compilers do constant propagation within basic blocks, while some compilers
perform constant propagation in a more complex control flow.
Some compilers perform constant propagation for integer constants but not floating-
point constants.
Few compilers perform constant propagation through bitfield assignments.
Few compilers perform constant propagation for address constants through pointer
assignments.
2. Constant Folding
Expressions having constant operands can be evaluated at compile-time, thus
improving runtime performance and reducing code size by avoiding evaluation.
Example:
In the given code fragment, the expression (3+5) can be evaluated at compile-time,
and it is replaced with the constant 8.
int f (void)
{
return 3 + 5;
}

2
byjusexamprep.com
Below is the piece of code after constant folding.
int f (void)
{
return 8;
}

Notes:
Constant folding is a relatively easy optimization.
Generally, programmers don't write expressions like (3 + 5) directly, but these
expressions are relatively common after applying macro expansion or other
optimizations like constant propagation.
All C compilers can perform constant folding, i.e., fold integer constant expressions
after performing macro expansion (ANSI C requirement).
Some environments support floating-point rounding modes that can be changed at run
time dynamically. In such environments, expressions such as (1.0 / 3.0) must be
evaluated at runtime if the rounding mode is unknown at compile time.
Example of Constant propagation:
1. Here is a program, we want to know which variables have constant values and which
don't for every point in the program (after every statement). We say that a variable
has a constant value at a certain point iff every execution that reaches that point
gives that variable the same constant value.
Example-
1. z = 3
2. x = 1
3. while (x > 0) {
4. if (x = 1) then
5. y = 7
6. else
7. y = z + 4
8. x = 3
9. print y
10. }
For this example, there are some simple constant propagation outputs we can see:
• In line number 1, the variable z has the value of 3.
• In line number 2, the variable x has the value of 1.
• In line number 7 and 8, the variable y has the value of 7. So, the print statement
can be replaced by print 7.

3
byjusexamprep.com
3. Liveness Analysis
Definition
– A variable will be live at a particular point in the program if its value is used at that
point or will be used in the future (otherwise, it is dead).
∴ To calculate the liveness of a variable at a given point, we need to look into the
future.
– A program has an unbounded number of variables.
– It must be executed on a machine with a limited number of registers
– 2 variables can use the same register if they are never in use (live) at the same
time (i.e., never simultaneously live).
∴ Register allocation uses liveness information
Liveness by Example

What is the live range of b?


– Variable b is read in statement 4, so b is live on the (3 → 4) edge
– Since statement 3 does not assign into b, b is also live on the (2→3) edge
– Statement 2 assigns b, so any value of b on the (1→2) and (5→2) edges are not
needed, so b is dead along these edges
b’s live range is (2→3→4)

4
byjusexamprep.com
Example:
1. A variable x will be live at a statement Si in a program if it holds the following
three conditions simultaneously:
1. There exists another statement Sj that uses x
2. There exists a path from Si to Sj in the control flow graph corresponding to the
program
3. The path has no other intervening assignment to x, including at Si and Sj

The variables, which are life both at the statement in basic block two and at the
statement in basic block 3 of the above control flow graph, are
A. p, s, u
B. r, s, u
C. r, u
D. q, v
Answer: C
Explanation: Live variable Analysis is useful in compilers to find variables in each
program needed in the future.
As per the definition given in the question, a variable is live if it holds a value that
may be required in the future. In other words, it is used in the future before any new
assignment.
4. Common subexpression elimination
In compiler theory, common subexpression elimination is a compiler optimization
technique that searches for identical expressions (i.e., such expressions, where they
all evaluate the same value) and analyses whether it is worth replacing them with a
single variable holding the computed value.
Example:
In the following code fragment:
a = b * c + g;
d = b * c * e;

5
byjusexamprep.com
we can transform the code to:
tmp = b * c;
a = tmp + g;
d = tmp * e;
if the cost of storing and retrieving(accessing) tmp is less than the cost of calculating
b * can take extra time.
Benefits:
The advantages of performing Common subexpression elimination are great; firstly, it
is a commonly used optimization technique.
In simple cases, programmers may manually eliminate the duplicate expressions while
writing the code. The greatest source of CSEs is intermediate code sequences
generated by the compiler, such as for array indexing calculations, where the
developer can't intervene manually. In some complex cases, language features may
lead to the creation of many duplicate expressions. For instance, C macros, where
macro expansions may result in common subexpressions not apparent in the source
code.
Compilers need to be informed about the no. of temporaries created to hold values.
An excessive no. of temporary values creates register pressure, possibly resulting in
spilling registers to memory, which can take more time than simply recomputing an
arithmetic result when needed.
5. Redundant instruction elimination
At a source code level, the following can be done by the user:
int add_ten(int x)
int add_ten(int x) int add_ten(int x) int add_ten(int x) int add_ten(int x)
{ { { {
int y, z; int y; int y = 10; return x + 10;
y = 10; y = 10; return x + y; }
z = x + y; y = x + y; }
return z; return y;
} }
At the compilation time, the compiler searches for instructions redundant(duplicate)
in nature. Multiple load and store instructions may carry the same meaning even if
some of those instructions are removed. For example:
● MOV x, R0

● MOV R0, R1

6
byjusexamprep.com
We can remove the first instruction and can rewrite the sentence as:
MOV x, R1
6. Unreachable code
It is a part of the program code that cannot be accessed ever because of programming
constructs. Programmers may have accidentally written a piece of code that can't be
reached.
Example 1:
void add_ten(int x)
{
return x + 10;
printf(“value of x is %d”, x);
}
In this code segment, the compiler will never execute the printf statement as the
program control returns before it can execute; hence we can remove printf.
Example 2:
x=a*b;
y=x+c;
z=x*c;
print(z)
here, y is a dead variable, y will never be used in the future.
The flow of control optimization
There are some instances in a code where the control jumps back and forth without
any significant task. We can remove these jumps. Consider the following chunk of
code:
...
MOV R1, R2
GOTO L1
...
L1 : GOTO L2
L2 : INC R1
In this code, we can remove the label L1 as it passes the control to label L2. So,
instead of jumping first to L1 and then to L2, the control can directly goto L2, as:
...
MOV R1, R2
GOTO L2
...
L2 : INC R1

7
byjusexamprep.com
7. Algebraic expression simplification
There are some instances where algebraic expressions can be made simple. Let's
consider an example; the expression a = a + 0 can be replaced by itself and the
expression a = a + 1 can be replaced by INC a.
Example 1:
x=y+0
a=x+b
after algebraic simplification,
x=y
a=x+b
after variable propagation / copy propagation,
a=y+b
Example 2:
x=y+a
z=x-y
c=z+b
after cancellation,
z=a
c=z+b
after copy propagation,
c=a+b
8. Strength reduction
Some operations consume more time and space. We can reduce their 'strength' by
replacing them with some other operations that require less time and space but
generates the same result.
X * 2 can be replaced by x << 1, which involves only one left shift operation. The
output of a * a and a2 are the same, but a2 is more efficient to implement.
Example 1:
x=j*2
this is costlier.
cheaper will be: x=j+j or x=j<<1
Accessing machine instructions
The target machine can have more complicated instructions, which can perform
several specific operations more efficiently. If the target code can accommodate these
instructions directly, this will improve the quality of the code and yield more efficient
results.

8
byjusexamprep.com
RUNTIME ENVIRONMENTS
1. Runtime Environments in Compiler Design
A translation needs to relate the dynamic actions (activities) that must occur at
runtime to implement the program with the static source code of a program. The
program consists of names for identifiers, procedures, etc., that requires mapping with
the actual memory location at runtime.
The runtime environment is a state (phase) of the target machine, which may include
environment variables, software libraries, etc., to provide services to the processes
running in the system.
1.1 Activation Tree
A program consists of procedures, and a procedure definition is a declaration
that, in its simplest form, relates an identifier (procedure name) with a statement
(body of the procedure). Each execution of the procedure is referred to as an
activation of the procedure.
A lifetime of activation is the sequence of steps present in the execution of the
procedure. If 'x' and 'y' are two procedures, then their activations will be non-
overlapping (when one is called after the other) or nested (nested procedures).
A recursive procedure if a new activation begins before completion of an earlier
activation of the same procedure. An activation tree depicts the way control
enters and leaves activations.
Properties of activation trees are: -
Each node represents the activation of a procedure.
The root node represents the activation of the main function.
The procedure 'x' node is the parent of the node for procedure 'y' if the control
flows from procedure x to procedure y.
To understand this idea, we take a piece of code:
...
printf(“Enter Your Name: “);
scanf(“%s”, username);
show_data(username);
printf(“Press any key to continue…”);
...
int show_data(char *user)
{
printf(“Your name is %s”, username);
return 0;
}
...

9
byjusexamprep.com
The activation tree of the code is given below.

Now, we learned that procedures are executed in a depth-first fashion; thus,


stack allocation is the suitable form for procedure activation storage.
Storage Allocation
Runtime environment maintains runtime memory requirements for the following:
● Code: It is the text part of a program that doesn't change at runtime. Its
memory requirements are known during compilation.
● Procedures: The text part is static, but they are called randomly. That's why
stack storage is used to maintain activations and procedure calls.
● Variables: Variables are called at the runtime only unless and until they are
global variables or constant. The Heap memory allocation technique is used
for maintaining memory allocation and de-allocation for variables in runtime.
Example – Let us consider the program of Quicksort as follows:
main() {

int n;
readarray();
quicksort(1,n);
}

quicksort(int m, int n) {

int i= partition(m,n);
quicksort(m,i-1);
quicksort(i+1,n);
}

10
byjusexamprep.com
The activation tree for this program is as follows:

The first main function acts as root, then the main calls read array and Quicksort.
Quicksort, in return, calls partition, and Quicksort functions again. The control
flow in a program corresponds to the depth-first traversal of the activation tree,
which starts at the root.
1.2 CONTROL STACK AND ACTIVATION RECORDS
Runtime stack or control stack is used to keep track of the live procedure
activations, i.e., the procedures whose execution has not been completed. A
procedure name is pushed onto the stack when it is called (means activation
begins), and it is popped off the stack when it returns (means activation ends).
Information required by a single execution of a procedure is managed and
maintained using an activation record or frame. Basically, when a procedure is
called, an activation record is pushed into the stack, and as the control flow
returns to the caller function, the activation record is popped out of the stack.

11
byjusexamprep.com
1.3 A general activation record consists of:

It stores temporary and intermediate values of an


Temporaries
expression.

Local Data Stores local data called procedure.

It stores machine status such as Registers,


Machine Status
Program Counter, etc., before the procedure is called.

Stores the address of the activation record of the


Control Link
caller procedure.

It stores the information of data that is outside the


Access Link
local scope.

Stores actual parameters, i.e., parameters that are


Actual Parameters
used to send input to the called procedure.

Return Value It stores return values.

Whenever a procedure is called and executed, its activation record is stored on


the stack, also known as the control stack. When a procedure calls another
procedure, the caller's execution is suspended until the first called procedure
finishes its execution. At the same time, the activation record of the called
procedure is stored on the stack.
We assume that the program control flows sequentially, and when a procedure is
called, its whole control is given to the called procedure. When a called procedure
is executed, it returns the control to the caller. And this type of control flow makes
it easy to represent a sequence of activation records in the form of a tree, which
is called an activation tree.
Control stack for the above quicksort procedure:

12
byjusexamprep.com

1.4 Runtime storage can be subdivided to hold:


● Target code- the program code is static as its size can be determined at compile
time
● Static data objects
● Dynamic data objects- heap
● Automatic data objects- stack

13
byjusexamprep.com

I. Static Storage Allocation


● If we create a memory at compile-time for any program, memory will also
be created in the static area.
● For any program, a memory is created at compile-time only, so that
memory is created only once.
● This storage allocation doesn't support dynamic data structure, i.e.,
memory is created and allocated at compile-time and deallocated after
completing the program.
● The drawback is it does not support recursion.
● Another drawback is it requires the size of data to be known at compile
time
E.g., FORTRAN was designed to permit static storage allocation.

II. Stack Storage Allocation


● It is organized as a stack, and activation records are pushed and popped
into the stack as activation begins and ends, respectively. Local variables
are stored in activation records, so they can fetch fresh storage in each
activation.
● It supports recursion.

III. Heap Storage Allocation


● Memory's Allocation and deallocation can be done at any time and any
place, depending on the user's requirement.
● Heap allocation is used to allocate memory dynamically to the variables
and claim it back when they are no longer required.
● Recursion is supported

14
byjusexamprep.com

Variables that are local to a procedure call are allocated and de-allocated at
runtime only.
Except for statically allocated memory areas, both stack and heap memory can
grow and shrink dynamically and unexpectedly. Therefore, heap allocation is not
provided with a fixed memory in the system.

As depicted in the image above, the text part of the code is allocated a fixed
amount of memory. Stack and heap memory is arranged at the extremes of the
total memory allocated to the program.

15
byjusexamprep.com
1.5 PARAMETER PASSING
The communication medium between procedures is known as the parameter
passing mechanism. The values of the variables are transferred from a calling
procedure to the called procedure by using some mechanism.
Example:
1. The dynamic Binding occurs during the:
A. Compile time
B. Run time
C. Linking time
D. Pre-processing time
Ans. B
Solution
The dynamic Binding occurs during the run time
2. Heap Allocation is required for language that
A. Use dynamic Scope rules
B. Support dynamic data structures
C. Support recursion
D. Support recursion and dynamic data structures
Ans. B
Heap Allocation is required for languages that support dynamic data structures.
3. Consider the program given below in a block-structured pseudo-language
with lexical scoping and nesting of procedures permitted.
Program main;
Var ...
Procedure A1;
Var ...
Call A2;
End A1
Procedure A2;
Var ...
Procedure A21;
Var ...
Call A1;
End A21
Call A21;
End A2
Call A1;
End main.

16
byjusexamprep.com
Consider the calling chain: Main → A1 → A2 → A21 → A1
The correct set of activation records along with their access links is given by

A. B.

C. D.

Answer: D.

The access link is defined as a link to the activation record of the closest lexically
enclosing block in the program text, so the closest enclosing blocks respectively
for A1, A2, and A21 are main, main, and A2 Since, Activation records are created
at procedure entry time and destroyed at procedure exit time.

17
byjusexamprep.com
Solution:
Link to activation record of closest lexically enclosing block in the program text.
It depends on the static program text.
Here, Calling sequence is given as,
Main->A1->A2->A21->A1
Now, A1, A2 are defined under Main…So A1, A2 Access link is pointed to main
A21 is defined under A2 hence its Access link will point to A2.
Parameter Passing
The communication medium between procedures is known as the parameter passing
mechanism. Variable values from a calling procedure are sent to the called procedure by
some technique. Some basic terminologies:
r-value
The value of an expression is its r-value. The value in a single variable is also called an r-
value if it appears on the right-hand side of the assignment operator. r-values can also be
assigned to another variable.
l-value
The memory location (address) where an expression is stored is called the l-value of that
expression. It always appears on the left-hand side of an assignment operator.
Example:
day = 1;
week = day * 7;
month = 1;
year = month * 12;
From this example, we got to know that constant values like 1, 7, 12, and variables are
similar to day, week, month, and year. And all have r-values. Only some variables have l-
values as they represent the memory location assigned to them.
For example:
7 = x + y;
is an l-value error, as the constant seven does not represent any memory location.
Formal Parameters
Variables that collect the information passed by the caller procedure are known as formal
parameters. Such variables are declared in the definition of the called function.
Actual Parameters
Variables whose addresses or values are passed to the called procedure are known as actual
parameters. Such variables are stated in the function call as an argument.

18
byjusexamprep.com
Example:
fun_one()
{
int actual_parameter = 10;
call fun_two(int actual_parameter);
}
fun_two(int formal_parameter)
{
print formal_parameter;
}
Formal parameters keep the information of the actual parameter, and this is based upon
the technique used for the parameter passing. It may be an address or a value.

Pass by Value
In this mechanism, the calling procedure passes the r-value of actual parameters, and the
compiler puts that value into the called procedure's activation record. Then formal
parameters hold the values which were passed by the calling procedure. If there is any
change in the values held by the formal parameters, then it shouldn't impact actual
parameters.

Pass by Reference
In this mechanism, the l-value of the actual parameter is copied in the activation record of
the called procedure. In this way, the called procedure now has the address of the actual
parameter, and the formal parameter points to the same memory location. So, if the value
pointed by the formal parameter changes, the impact will be seen in the actual parameter
as they also point to the same value.

Pass by Copy-restore
This mechanism is similar to the 'pass-by-reference' mechanism. The only difference is that
the changes are made in the actual parameters when the called procedure finishes. With
the function call, values of actual parameters are copied to the activation record of the
called procedure. Formal parameters, if changed, they have no real-time effect on actual
parameters (as l-values are passed). Still, when the called procedure finishes, then the l-
values of formal parameters are copied into the l-values of actual parameter.

19
byjusexamprep.com
Example:
int y;
calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
}
copy_restore(int x)
{
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
}
When this function is completed, the l-value of the formal parameter 'x' is copied in the
actual parameter 'y'. Even if the value of 'y' is changed before the procedure finishes, the
l-value of 'x' is copied to the l-value of 'y', making it behave like a call by reference.

Pass by Name
Languages like Algol provides a new kind of parameter passing mechanism that works like
a preprocessor in C language. In this mechanism, the name of the procedure is called and
replaced by its actual body. Pass-by-name substitutes the argument expressions in a
procedure call for the corresponding parameters in the procedure's body to work on actual
parameters, much like pass-by-reference.
The symbol table is a data structure created and maintained by compilers to store
information about various entities such as function names, variable names, classes, objects,
interfaces, etc. The symbol table is used by both synthesis and the analysis part of a
compiler.
A symbol table may give the following purposes according to the language given:
● For storing the names of all the entities in a structured manner in one place.
● To check if a variable has been declared or not.
● To implement the type checking by checking and verifying assignments and
expressions present in the source code.
● To detect the scope of a name (scope resolution).
A symbol table is simply a table that can be either hash or a linear table. It maintains an
entry for every name in the format given below:

20
byjusexamprep.com

<symbol name, type, attribute>


Example- Suppose a symbol table has to store complete information about the variable
declaration given below:
static int interest;
then it should store the entry such a manner:
<interest, int, static>
The attribute clause has the entries related to the name.

Implementation
If a compiler has to handle a small amount of data, then the symbol table can be
implemented in an unordered list. It is easy to code but only suitable for small tables. The
following can implement a symbol table in any of the following ways:
● Hash table
● Binary Search Tree
● Linear (sorted or unsorted) list
Symbol tables are mostly implemented in the form of hash tables, where the source code
symbol of the program itself is treated as a key for the hash function, and the information
about the symbol is the return value.

Operations
A symbol table, either hash or linear, should provide the following operations.

insert()
The analysis phase more frequently uses this operation, i.e., the first half of the compiler
where tokens are determined, and names are stored in the symbol table. This operation is
used to add/insert information in the symbol table about unique/new names occurring in
the source code program. The format/structure in which the names are stored depends
upon the compiler.
An attribute for a symbol is the information associated with that symbol in the source code.
This information contains the value, scope, state, and type of the symbol. The insert()
function takes one symbol and its attributes as an argument and stores the information in
the symbol table.
Example-
int a;
it should be processed by the compiler:

21
byjusexamprep.com

insert(a, int);

lookup()
lookup() operation is used to search names in the symbol table to detect if:
● Symbol exists in the table.
● The symbol is declared before it is being used.
● Name is used in the scope.
● The symbol is initialized.
● The symbol is declared multiple times.
The format of the lookup() function differentiates according to the programming language.
The basic structure should match the following format:

lookup(symbol)
This method returns 0 if that symbol doesn't exist in the symbol table. If the symbol is
present in the symbol table, it returns its attributes stored in the symbol table.

Scope Management
A compiler manages two types of symbol tables: a global symbol table obtained by all
the procedures and scope tables created for each scope in the program.
For determining the scope of a name, symbol tables are arranged in a hierarchical structure
as shown in the example below:
...
int value=10;

void pro_one()
{
int one_1;
int one_2;

{ \
int one_3; |_ inner scope 1
int one_4; |
} /

int one_5;

{ \

22
byjusexamprep.com
int one_6; |_ inner scope 2
int one_7; |
} /
}

void pro_two()
{
int two_1;
int two_2;

{ \
int two_3; |_ inner scope 3
int two_4; |
} /

int two_5;
}
...

We can represent the above program in a hierarchical structure:

The global symbol table consists of one global variable and two procedure names, which
should be available to all the child nodes mentioned above. The names in the pro_one
symbol table (and in its child tables) are unavailable for pro_two symbols and its child
tables.

23
byjusexamprep.com
The symbol table's structure hierarchy is stored in the semantic analyzer. Whenever a name
is to be searched in a symbol table, it is done using the following algorithm:
● First of all, a symbol will be searched in the current scope, i.e., the current symbol
table.
● if a name is found, then the search is completed; else, it will be searched in the
parent symbol table until,
● Either the name is found, or the global symbol table has been searched for the
name.
A source code can be translated into its target machine code directly, then; why we need
to translate it into an intermediate code, and then it is translated to its target code? Let us
know the reasons why an intermediate code is required.

● Suppose a compiler translates the source code in LLL to its target machine language
(HLL) without generating intermediate code, then for every new machine. In that
case, a full native compiler is required.
● Intermediate code eliminates the need for a new compiler for each unique machine
by maintaining the analysis portion is the same for all the compilers.
● The second part of the compiler, i.e., synthesis, is being changed according to the
target machine.
● It will be easy to apply the source code changes to improve code's performance by
applying code optimization mechanisms on the intermediate code.

****

24

You might also like