Professional Documents
Culture Documents
Institute of Engineering & Management: Department of Information Technology Workbook (IT605D)
Institute of Engineering & Management: Department of Information Technology Workbook (IT605D)
1. MAKAUT Syllabus
Paper name: Compiler Design
Code: IT605C
Contacts: 3L +1T
Credits: 4
Pre-requisites:
CS402 (Formal Language & Automata Theory)
CS 201 (Basic Computation and Principles of C),
CS302 (Data Structure & Algorithm),
Issues in the design of code generator, a simple code generator, Register allocation &
assignment.
2. Recommended Books:
Aho, Sethi, Ullman - “Compiler Principles, Techniques and Tools” - Pearson Education.
Holub - “Compiler Design in C” - PHI.
3. Course Outcomes:
Understand lexical analyzer and parser generator tools.
Build symbol tables and generating intermediate code.
Generate assembly code for a RISC machine.
Implement a parser such as a bottom-up SLR and implement semantic rules into a parser
that performs attribution while parsing.
Understand compiler architecture, register allocation and compiler optimization.
4. Course Information
PROGRAMME: Information technology DEGREE: B.Tech
DAY 1
Course: Compiler Design
Relevant MAKAUT syllabus portion: Storage organization (Subdivision of run-time
memory, Activation records)
Objectives:
1. To understand subdivision of runtime memory and how the memory subdivisions are
utilized.
2. To understand activation record structure.
Notes:
Run-Time Environments
• The abstractions embodied in the source language definition are - names, scopes, bindings,
data types, operators, procedures, parameters, and flow-of-control constructs.
• A compiler must accurately implement these abstractions and also must cooperate with the
operating system and other systems software to support these abstractions on the target
machine.
• To do so, the compiler creates and manages a run-time environment in which it assumes its
target programs are being executed.
Storage Organization
Code area
Static area
Heap area
Stack area
Activation Record
The activation record is a block of memory used for managing information needed by a
single execution of a procedure.
Various fields of activation record are as follows:
Temporary Values
Local Variables
Saved Machine Registers
Control Links
Access Links
Actual Parameters
Return Values
2. Explain the subdivisions of logical memory for executing the target code.
DAY 2
Course: Compiler Design
Relevant MAKAUT syllabus portion: Activation trees, Control stack
Objectives:
1. To understand activation tree and control stack.
Notes:
Activation Trees
Stack allocation would not be feasible if procedure calls, or activations of procedures, did not
nest in time.
If an activation of procedure p calls procedure q, then that activation of q must end before the
activation of p can end. There are three common cases:
1. The activation of q terminates normally. Then in essentially any language, control resumes
just after the point of p at which the call to q was made.
2. The activation of q, or some procedure q called, either directly or indirectly, aborts; i.e., it
becomes impossible for execution to continue. In that case, p ends simultaneously with q.
3. The activation of q terminates because of an exception that q cannot handle.
Procedure p may handle the exception, in which case the activation of q has terminated while
the activation of p continues, although not necessarily from the point at which the call to q
was made. If p cannot handle the exception, then this activation of p terminates at the same
time as the activation of q, and presumably the exception will be handled by some other open
activation of a procedure.
Control Stacks
Control stack keeps track of live procedure activations. The idea is to push the node for
activation onto the control stack as the activation begins and to pop the node when the
activation ends. Then the contents of the control stack are related to the path of the activation
tree. When node n is at the top of the control stack, the stack contains the nodes along the
path from m to the root.
1. By taking example of factorial program explain how activation record will look like for
every recursive call in case of factorial (3).
2. Write a quick sort procedure. Draw the activation three when the numbers 9, 8, 7, 6, 5, 4,
3, 1 are sorted. What is the largest number of activation record can appear on the stack?
3. What is the purpose of control stack used in run time storage organization?
DAY 3
Course: Compiler Design
Relevant MAKAUT syllabus portion: Scope of declaration, Binding of names, Storage
allocation strategies
Objectives:
1. To understand scope of declaration and scope rules.
2. To understand stack allocation and heap allocation.
Notes:
Binding of Names
Even if each name is declared once in a program, the same name may denote different data
objects at run time. The data objects are correspond to a storage location that can holds
values.
A binding is the dynamic counterpart of a declaration. A binding consists of the following:
1. activation of procedures
2. binding of the names
3. lifetime of the binding
2. Using the scope rules of Pascal, determine the declarations that apply to each occurrence of
the names a and b in the code segment below. The output of the program consists of the
integers 1 through 4.
3. What is the output if the following C program, if the compiler uses dynamic scope? Briefly
justify your answer.
int r;
void write (void){
printf(“%d”, r);
}
void display(void){
int r = 37.24;
write();
}
main(){
r = 11.34;
write();
display();
}
DAY 4
Course: Compiler Design
Relevant MAKAUT syllabus portion: Parameter passing (call by value, call by reference,
copy restore, call by name)
Course Outcomes:
Objectives:
1. To understand different parameter passing technique for procedure call.
Notes:
Parameter Passing
There are two types of parameters-
- Formal Parameter
- Actual Parameter
Based on these parameters there are various parameter passing methods, the most common
methods are:
1. call by value
2. call by reference
3. call by value-result
4. call by name
2. Write the output of the following C program. Assume following parameter passing methods
(ignore the parameter passing syntax).
i) call by value ii) call by reference
iii) call by value-result iv) call by name
int i;
int j;
void p(int x, int y){
x += 1;
i += 1;
y += 1;
}
void swap(int x, int y){
int a[2] = {1, 1};
int b[3] = {1, 2, 0};
p(a[i], a[i]);
printf(“%d, %d”, a[0], a[1]);
swap(j, a[j]);
printf(“%d, %d, %d”, b[0], b[1], b[2]);
return 0;
}
DAY 5
Course: Compiler Design
Relevant MAKAUT syllabus portion: Symbol tables, dynamic storage allocation
techniques.
Course Outcomes:
Objectives:
1. To understand symbol table: use, construction and management.
2. To understand dynamic memory allocation.
Notes:
Symbol Tables
- A compiler uses a symbol table to keep track of scope and binding information about
names.
- The table is searched every time a name is encountered in source code.
- A symbol-table mechanism must allow us to add new entries and find existing entries
efficiently. We evaluate each scheme on basis of time required to add n entries and make e
enquires.
- A symbol-table mechanism must allow us to add new entries and find existing entries
efficiently.
Explicit Allocation
o Explicit Allocation for Fixed Sized Blocks
o Explicit Allocation for Variable Sized Blocks
Implicit Allocation
DAY 6
Course: Compiler Design
Relevant MAKAUT syllabus portion: Intermediate languages, Graphical representation
Objectives:
1. To understand different graphical representation of intermediate language
Notes:
Intermediate Code Generation
In the analysis-synthesis model of a compiler, the front end translates a source program into
an intermediate representation from which the back end generates target code.
Benefits of machine-independent intermediate form are:
1. Retargeting is facilitated; a compiler for different machine can be created by attaching a
back end for the new machine to an existing front end.
2. A machine-independent code optimizer can be applied to the intermediate representation.
Graphical Representation
A syntax tree depicts the natural hierarchical structure of source program.
A dag gives the same information but in a more compact way because common sub
expressions are identified.
Syntax tree: represent constructs in the source program; the children of a node represent the
meaningful components of a constructor.
DAG (direct acyclic graph): identifies the common subexpressions (subexpressions that
occur more than once) of the expression.
• More compact than syntax tree.
3. Translate the arithmetic expression a * - (b + c) into syntax tree and postfix notation.
4. Design syntax tree and postfix notation for the following expression:
(a + (b * c)) ^ d – e / (f + g)
DAY 7
Course: Compiler Design
Relevant MAKAUT syllabus portion: Three-address code, Quadruples, Triples, Indirect
triples.
Objectives:
1. To understand three-address code representation for source code.
Notes
Three-Address Code
In three-address code, there is at most one operator on the right side of an instruction; that is,
no built-up arithmetic expressions are permitted.
x + y * z might be translated into the sequence of three-address instructions:
t1 = y * z
t2 = x + t 1
where ti and t2 are compiler-generated temporary names.
“Three-address code is a linearized representation of a syntax tree or a DAG in which explicit
names correspond to the interior nodes of the graph.”
Three-address instructions
three-address instructions specifies the components of each type of instruction, but it
does not specify the representation of these instructions in a data structure.
in a compiler, these instructions can be implemented as objects or as records with
fields for the operator and the operands.
three such representations are called
"quadruples,"
"triples,"
and "indirect triples."
Quadruples
Quadruple has four fields, which we call op, arg1, arg2, and result. The op field contains an
internal code for the operator.
Triples
Triple has only three fields, which we call op, arg1, and arg2
Indirect Triples
Indirect triples consist of a listing of pointers to triples, rather than a listing of triples
themselves. For example, let us use an array instruction to list pointers to triples in the desired
order.
1. Translate the following expression A = B * - C + B * - C into quadruple and triples
separately.
5. Distinguish between quadruples, triples and indirect triples for the expression.
x = y * −z + y * −z
DAY 8
Course: Compiler Design
Relevant MAKAUT syllabus portion: Three-address code
Course Outcomes:
Objectives:
1. To get introduced to implementation of Three-Address Statements.
Notes:
Types of Three Address Statements
Declarative Statements
Assignment Statements
Arrays
Boolean Expression
Flow Control Statement
Case Statement
Procedure Call
2. Consider the following code fragment. Generate the three address code for it.
switch(a + b){
case 1: x = x + 1;
case 2: y = y + 2;
case 3: z = z + 3;
default: c = c – 1;
}
3. Write syntax directed translation for the flow-of-control statement – i) if – then, ii) if-then-
else, iii) while, and iv) for using the translation, convert the following statement to three
address code.
if (x > 10) then
while (a > 10)
y=x+a
else if (y > 100)
y = 1;
DAY 9
Course: Compiler Design
Relevant MAKAUT syllabus portion: Three-address code
Lecture 9 (60 minutes)
Objectives:
1. To understand backpatching to use to generate code for Boolean expression and flow-
of-control statements in one pass.
Notes:
- “Backpatching is the activity of filling up unspecified information of labels using
appropriate semantic actions during code generation process.”
- Implementation of syntax directed definition using two passes is the most convenient
method.
- If we decide to generate the three address code for given syntax directed definition using
single pass only, then the main problem that occurs is the decision of addresses of the labels.
- The jump (goto) statements refer these label statements and in one pass it becomes difficult
to know the locations of the label statements.
- If we use two passes instead of one pass then in one pass we can leave these addresses
unspecified and in second pass this incomplete information can be filled up.
- To overcome the problem of processing the incomplete information in one pass the
backpatching technique can be used.
DAY 10
Course: Compiler Design
Relevant MAKAUT syllabus portion: The principal sources of optimization
Course Outcomes:
Objectives:
1. To understand the principal source of optimization in target code.
Notes
Principal Sources of Optimization
The optimization can be done locally or globally. If the transformation is applied on the
same basic block then that kind of transformation is done locally otherwise
transformation is done globally.
Function preserving transformations
There are a number of ways in which a compiler can improve a program without
changing the function it computes.
Common subexpression elimination, copy propagation, dead-code elimination, and
constant folding are common examples of such function preserving transformation.
1. Compile Time Evaluation
1.1 Folding
1.2 Constant propagation
2. Common Sub Expression Elimination
3. Copy Propagation
4. Code Movement
Loop invariant computation
5. Strength Reduction
6. Dead Code Elimination
Loop Optimization
Code optimization can be significantly done in loops of the program. Specially inner loop
is a place where program spends large amount of time.
Hence, if number of instructions are less in inner loop then running time will get
decreased to a large extent.
2. What is meant by common sub expression? Explain the common sub expression
elimination technique with the help of suitable example.
DAY 11
Course: Compiler Design
Relevant MAKAUT syllabus portion: blocks & flow graphs
Objectives:
1. To understand basic blocks and flow graphs.
Notes:
Basic Block and Flow Graph
2. For each leader, its basic block consists of leader and all statements up to but not including
the next leader or end of the program.
Flow Graph
We can add the flow-of-control information to the set of basic blocks making up a
program by constructing a directed graph called flow graph.
The nodes of the flow graph are the basic block.
One node is distinguish as initial; it is the block whose leader is the first statement.
There is a directed edge from block B1 to block B2 if B2 immediately follow B1 in some
execution sequence; that is, if
1. There is a conditional or unconditional jump from the last statement of B1 to the
first statement of B2, or
2. B2 immediately follows B1 in the order of the program, and B1 does not end in an
unconditional jump.
We say that B1 is a predecessor of B2, and B2 is a successor of B1.
DAY 12
Course: Compiler Design
Relevant MAKAUT syllabus portion: Transformation of basic blocks
Lecture 12 (60 minutes)
Objectives:
1. To understand basic blocks optimization by different transformation techniques.
Notes
Transformation of Basic Blocks
There are two important classes of local transformations that can be applied to basic
blocks:
1. Structure Preserving Transformation
a. Common subexpression elimination
b. Dead-code elimination
c. Renaming of temporary variables
d. Interchange of two independent adjacent statements
2. Algebraic Transformation
1. Consider some inter-block code optimization without any data flow analysis by treating each
extended basic block as if it is a basic block. Give algorithms to do the following optimizations
within an extended basic block. In each case, indicate what effect on other extended basic
blocks a change within one extended block can have.
i) Common sub-expression elimination
ii) Constant folding
iii) Copy propagation
3. Construct basic blocks and data flow graph and identify loop invariant statements:
for(i = 1 to n){
j = 1;
while(j <= n){
A = B * C / D;
j = j + 1;
}
}
DAY 13
Course: Compiler Design
Relevant MAKAUT syllabus portion: The DAG representation of basic blocks
Objectives:
1. To understand how to represent Basic Blocks using DAG.
Notes:
The DAG representation of basic blocks
Directed Acyclic Graph (DAG) is a useful data structure for implementing transformation
on a basic block.
A DAG gives a picture of how the value computed by each statement in a basic block is
used in subsequent statements of the basic block.
Constructing a DAG form three-address statements is a very good way of determining
common subexpressions within block; determining which names are used inside the block
but evaluated outside the block, and determining which statements of the bock could have
their value used outside the block.
A DAG for a basic bock is a directed acyclic graph with following labels on nodes :
1. Leaves are labelled by unique identifiers, either variable names or constants.
2. Interiors nodes are labelled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels. The intention is
that interior nodes represent computed values, and the identifiers labelling a node are
deemed to have that value.
DAG construction
1. If the statement is in form x := y + z, we look for nodes that represent the “current” values
of y and z. Those could be leaves, or they could be interior nodes of the DAG if y and/or
z is already evaluated by previous statements of the block.
2. Then, we create node labelled + and give it two children y (left child) and z (right child).
3. However, if there is already a node denoting same value as y + z, we do not add the new
node to the DAG, but rather give the existing node the additional label x.
4. If x (not x0) had previously labelled some other node, we remove that label, since the
“current” value of x is the node just created.
5. For an assignment such as x := y we do not create a new node. Rather, we append label x
to the list of names on the node for the “current” value of y.
Initially, we assume there are no nodes, and node is identified for all arguments.
1. If node(y) is undefined, create a leaf labeled y, and let node(y) be this node. In case (i), if
node(z) is undefined, create a leaf labeled z and let that leaf be node(z).
2. In case (i), determine if there is a node labeled as op, whose left child is node(y) and right
child is node(z). If not, create such node. In either event, let n be the node found or
created. In case (ii), determine whether there is a node labeled op, whose lone child is
node(y). If not, create such a node, and let n be the node found or created. In case (iii), let
n be node(y).
3. Delete x from the list of attached identifiers for node(x). Append x to the list attached
identifiers for node n found in (2) and set node(x) to n.
2. Generate DAG representation of the following code and list out the applications of DAG
representation:
i = 1, s = 0;
while(i < 10){
s = s + a[i][j];
i = i + 1;
}
DAY 14
Course: Compiler Design
Relevant MAKAUT syllabus portion: Loops in Flow Graph
Course Outcomes:
Objectives:
1. To understand what constitute a loop in a flow-graph.
Notes:
Dominators
We say node d of a flow graph dominates node n, written as d dom n, if every path from
initial code of flow graph to n goes through d.
Under this definition, every node dominates itself, and the entry of a loop dominates all
the nodes in the loop.
A useful way of presenting dominator information is in a tree, called the dominator tree,
in which the initial node is the root, and each node d dominates only its descendents in
the tree.
The existence of dominator trees follows from a property of dominators; each node n has
a unique immediate dominator m that is the last dominator of n on any path from initial
node to n. In terms of the dom relation, the immediate dominator m has that property that
if d != n and d dom n, then d dom m.
Natural Loops
One important application of dominator information is in determining the loops of a flow
graph suitable for improvement, There are two essential properties of such loops.
1. A loop must have a single entry point, called the "header," This entry
point dominates all nodes in the loop, or it would not be the sole entry to the loop.
2. There must be at least one way to iterate the loop, i.e., at least one path back to the header.
Inner Loops
A natural notation of inner loop: one that contains no other loops.
When two loops have the header as shown in below, it is hard to tell which is inner loop.
Pre-header
1. What are the sources of redundancy in code? Give examples using flow graphs.
2. When is a flow graph said to be reducible? What are the properties of natural loops?
DAY 15
Course: Compiler Design
Relevant MAKAUT syllabus portion: Peephole optimization
Course Outcomes:
Objectives:
1. To optimize the target program using Peephole optimization technique.
Notes
Peephole Optimization
A statement-by-statement code-generation strategy often produces target code that
contains redundant instructions and suboptimal constructs. The quality of such target code
can be improve by applying “optimizing” transformations to the target program.
Peephole optimization – effective technique for locally improving the target code.
It examine a short sequence of target instructions (called peephole) and replacing these
instructions by a shorter or faster sequence whenever possible.
The technique can also be applied directly after code generation to improve intermediate
representation.
Peephole is a small. Moving window on the target code.
It is characteristic of peephole optimization that each improvement may spawn
opportunities for additional improvement. In general, repeated passes over the target code
are necessary to get maximum benefit.
Transformations that are characteristic of Peephole Optimization
redundant-instruction elimination
flow-of-control optimizations
algebraic simplifications
use of machine idioms
DAY 16
Course: Compiler Design
Relevant MAKAUT syllabus portion: Issues in the design of code generator
Objectives:
1. To understand concepts of code generation and issues in design of Code Generator.
Notes:
Code Generator
It takes as input an intermediate representation of the source code and produces as output an
equivalent target program.
Code generator is a process of creating assembly language / machine language statements
which will perform operations specified by soource program when they run.
Properties:
Correctness.
High Quality.
Efficient use of resources of the target machine.
Quick code generation.
DAY 17
Course: Compiler Design
Relevant MAKAUT syllabus portion: A simple code generator
Course Outcomes:
Objectives:
1. To understand construction of a simple code generator.
Notes:
Next-Use Information
• Next-use information is needed for dead-code elimination and register assignment.
• Next-use is computed by backward scan of a block and performing the following
actions on statements:
Algorithm
i : x := y op z
- add liveness / next-use info on x, y and z to statement i
- set x to “not live” and “no next use”
- set y and z to “live” and next-use of y and z to i.
getreg() algorithm
1. If y is store in a register R and R only holds the value y, and y has no next use, then
return R;
Update address descriptor: value y no value in R
2. Else, return a new empty register if available
3. Else, find and occupied register R;
Store content (register spill) by generating
MOV R, M
for every M in address descriptor of y;
return register R
4. Return a memory location
3. Generate code for the following C statement for target machine assuming all variables are
static.
x = a / (b + c) – d * (e + f)
DAY 18
Course: Compiler Design
Relevant MAKAUT syllabus portion: Register allocation & assignment
Course Outcomes:
Objectives:
1. To understand register allocation and assignment during code generation.
Notes:
• Global register allocation assigns variables to limited number of available registers and
attempts to keep these registers consistent across the basic bock boundaries
• Suppose loading a variable x has a cost of 2
• Suppose storing a variable x has a cost of 2
• Benefit of allocating a register to a variable x within loop L is
∑block B in L (use(x, B) + 2 live(x, B)
where use(x, B) is the number of times x is used in B and live(x, B) = true if x is live on
exit from B
1. What are the uses of register and address descriptors in code generation?