You are on page 1of 74

Paper Name: Compiler Design Paper Code: IT605C

Institute of Engineering & Management

Department of Information Technology


Workbook (IT605D)

Paper Name with code: Compiler Design (IT605C)


Name of the Teacher: Arup Kumar Chattopadhyay

Name of the Student:


Year: Section:
Class Roll No.:
University Roll No.:

1|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. MAKAUT Syllabus
Paper name: Compiler Design
Code: IT605C
Contacts: 3L +1T
Credits: 4

Pre-requisites:
CS402 (Formal Language & Automata Theory)
CS 201 (Basic Computation and Principles of C),
CS302 (Data Structure & Algorithm),

Introduction to Compiling [3L]


Compilers, Analysis of the source program, The phases of the compiler, Cousins of the
compiler.
Lexical Analysis [6L]
The role of the lexical analyzer, Tokens, Patterns, Lexemes, Input buffering, Specifications of
a token, Recognition of a tokens, Finite automata, From a regular expression to an NFA,
From a regular expression to NFA, From a regular expression to DFA, Design of a lexical
analyzer generator (Lex).
Syntax Analysis [9L]
The role of a parser, Context free grammars, Writing a grammar, Top down Parsing, Non-
recursive Predictive parsing (LL), Bottom up parsing, Handles, Viable prefixes, Operator
precedence parsing, LR parsers (SLR, LALR), Parser generators (YACC). Error Recovery
strategies for different parsing techniques.
Syntax directed translation [5L]
Syntax director definitions, Construction of syntax trees, Bottom-up evaluation of S
attributed definitions, L attributed definitions, Bottom-up evaluation of inherited attributes.
Type checking [4L]
Type systems, Specification of a simple type checker, Equivalence of type expressions, Type
conversions
Run time environments [5L]
Source language issues (Activation trees, Control stack, scope of declaration, Binding of
names), Storage organization (Subdivision of run-time memory, Activation records),
Storage allocation strategies, Parameter passing (call by value, call by reference, copy
restore, call by name), Symbol tables, dynamic storage allocation techniques.
Intermediate code generation [4L]
Intermediate languages, Graphical representation, Three-address code, Implementation
of three address statements (Quadruples, Triples, Indirect triples).
Code optimization [5L]
Introduction, Basic blocks & flow graphs, Transformation of basic blocks, Dag
representation of basic blocks, The principle sources of optimization, Loops in flow
graph, Peephole optimization.
Code generations [4L]

2|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

Issues in the design of code generator, a simple code generator, Register allocation &
assignment.

2. Recommended Books:
 Aho, Sethi, Ullman - “Compiler Principles, Techniques and Tools” - Pearson Education.
 Holub - “Compiler Design in C” - PHI.

3. Course Outcomes:
 Understand lexical analyzer and parser generator tools.
 Build symbol tables and generating intermediate code.
 Generate assembly code for a RISC machine.
 Implement a parser such as a bottom-up SLR and implement semantic rules into a parser
that performs attribution while parsing.
 Understand compiler architecture, register allocation and compiler optimization.

4. Day wise Lesson Plan with book reference:


Sl. No Day Module Topic Video Links Recommended
(Optional) books for the
topic
1 Day-1 Run time What is runtime Aho, Sethi,
environments environment? Ullman -
Subdivision of “Compiler
runtime memory, Principles,
basic of activation Techniques and
record structure Tools” - Pearson
Education.
2. Day-2 Run time Activation Tree, Aho, Sethi,
environments Control Stack and Ullman -
step by step “Compiler
construction of Principles,
activation tree, Techniques and
control stack for Tools” - Pearson
basic C functions Education.
3. Day-3 Run time The Scope of a Aho, Sethi,
environments Declaration, Ullman -
Binding of “Compiler
Names, Storage Principles,
allocation Techniques and
strategies Tools” - Pearson
Education.
4 Day-4 Run time Parameter passing Aho, Sethi,
environments - call by value, Ullman -
call by reference, “Compiler
copy restore, call Principles,
by name Techniques and
Tools” - Pearson
Education.

3|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

5 Day-5 Run time Symbol tables, Aho, Sethi,


environments dynamic storage Ullman -
allocation “Compiler
techniques Principles,
Techniques and
Tools”
6 Day-6 Intermediate Need of Aho, Sethi,
code Intermediate Ullman -
generation Languages, “Compiler
Graphical Principles,
Representation: Techniques and
Syntax Tree, Tools”
DAG, POSTFIX
expression
7 Day-7 Intermediate Three-address Aho, Sethi,
code code; Types of Ullman -
generation Three-Address “Compiler
Code Principles,
Techniques and
Tools”
8 Day-8 Intermediate An Overview of Aho, Sethi,
code Implementation Ullman -
generation of Three-Address “Compiler
Statements; Principles,
Syntax-Directed Techniques and
Translation into Tools”
Three-Address
Code
9 Day-9 Intermediate Backpatching Aho, Sethi,
code Ullman -
generation “Compiler
Principles,
Techniques and
Tools”
10 Day-10 Code Introduction to Ullman -
optimization Code “Compiler
Optimization, The Principles,
principal sources Techniques and
of optimization Tools”.
11 Day-11 Code Basic Blocks and Ullman -
optimization Flow Graphs “Compiler
Principles,
Techniques and
Tools”
12 Day-12 Code Transformation of Ullman -
optimization basic blocks “Compiler
Principles,
Techniques and
Tools”
13 Day-13 Code The DAG Ullman -

4|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

optimization representation of “Compiler


basic blocks Principles,
Techniques and
Tools”
14 Day-14 Code Loops in Flow Ullman -
optimization Graph: “Compiler
Dominators, Inner Principles,
loop, Reducible Techniques and
Flow Graphs Tools”
15 Day-15 Code Peephole Ullman -
optimization optimization “Compiler
Principles,
Techniques and
Tools”
16 Day-16 Code Issues in the Ullman -
Generation Design of Code “Compiler
Generator Principles,
Techniques and
Tools”
17 Day-17 Code Next-Use Ullman -
Generation Information, A “Compiler
Simple Code Principles,
Generator. Techniques and
Tools”
18 Day-18 Code Register Ullman -
Generation allocation & “Compiler
assignment Principles,
Techniques and
Tools”

4. Course Information
PROGRAMME: Information technology DEGREE: B.Tech

COURSE: Compiler Design SEMESTER: 5 CREDITS: 4

COURSECODE: IT605C COURSE TYPE: Theory

CORRESPONDING LAB COURSE


CONTACT HOURS: 44
CODE (IF ANY): NIL

5|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 1
Course: Compiler Design
Relevant MAKAUT syllabus portion: Storage organization (Subdivision of run-time
memory, Activation records)

Lecture 1 (60 minutes)

Topics Covered: What is runtime environment? Subdivision of


runtime memory, basic of activation record structure

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand subdivision of runtime memory and how the memory subdivisions are
utilized.
2. To understand activation record structure.

Notes:
Run-Time Environments
• The abstractions embodied in the source language definition are - names, scopes, bindings,
data types, operators, procedures, parameters, and flow-of-control constructs.
• A compiler must accurately implement these abstractions and also must cooperate with the
operating system and other systems software to support these abstractions on the target
machine.
• To do so, the compiler creates and manages a run-time environment in which it assumes its
target programs are being executed.
Storage Organization
 Code area
 Static area
 Heap area
 Stack area

6|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

Activation Record
The activation record is a block of memory used for managing information needed by a
single execution of a procedure.
 Various fields of activation record are as follows:
 Temporary Values
 Local Variables
 Saved Machine Registers
 Control Links
 Access Links
 Actual Parameters
 Return Values

1. What is the significance of activation record.

7|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Explain the subdivisions of logical memory for executing the target code.

8|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 2
Course: Compiler Design
Relevant MAKAUT syllabus portion: Activation trees, Control stack

Lecture 2 (60 minutes)

Topics Covered: Activation Tree, Control Stack and step by step


construction of activation tree, control stack for basic C functions

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand activation tree and control stack.

Notes:
Activation Trees
Stack allocation would not be feasible if procedure calls, or activations of procedures, did not
nest in time.
If an activation of procedure p calls procedure q, then that activation of q must end before the
activation of p can end. There are three common cases:
1. The activation of q terminates normally. Then in essentially any language, control resumes
just after the point of p at which the call to q was made.
2. The activation of q, or some procedure q called, either directly or indirectly, aborts; i.e., it
becomes impossible for execution to continue. In that case, p ends simultaneously with q.
3. The activation of q terminates because of an exception that q cannot handle.
Procedure p may handle the exception, in which case the activation of q has terminated while
the activation of p continues, although not necessarily from the point at which the call to q
was made. If p cannot handle the exception, then this activation of p terminates at the same
time as the activation of q, and presumably the exception will be handled by some other open
activation of a procedure.

9|P ag e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

Control Stacks
Control stack keeps track of live procedure activations. The idea is to push the node for
activation onto the control stack as the activation begins and to pop the node when the
activation ends. Then the contents of the control stack are related to the path of the activation
tree. When node n is at the top of the control stack, the stack contains the nodes along the
path from m to the root.

1. By taking example of factorial program explain how activation record will look like for
every recursive call in case of factorial (3).

10 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Write a quick sort procedure. Draw the activation three when the numbers 9, 8, 7, 6, 5, 4,
3, 1 are sorted. What is the largest number of activation record can appear on the stack?

11 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. What is the purpose of control stack used in run time storage organization?

12 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 3
Course: Compiler Design
Relevant MAKAUT syllabus portion: Scope of declaration, Binding of names, Storage
allocation strategies

Lecture 3 (60 minutes)

Topics Covered: The Scope of a Declaration, Binding of Names,


Storage allocation strategies

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand scope of declaration and scope rules.
2. To understand stack allocation and heap allocation.

Notes:

The Scope of a Declaration


A declaration in a language is a syntactic construct that associates information with a name.
The scope rules of a language determine which declaration of a name applies when the name
appears in the text of a program.

Binding of Names
Even if each name is declared once in a program, the same name may denote different data
objects at run time. The data objects are correspond to a storage location that can holds
values.
A binding is the dynamic counterpart of a declaration. A binding consists of the following:
1. activation of procedures
2. binding of the names
3. lifetime of the binding

13 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

Storage Allocation Strategies


Three different storage allocation strategies based on this division of runtime storage-
1. Static allocation – allocation of all data object at compile time.
2. Stack allocation – stack is used to manage the runtime storage.
3. Heap allocation – heap is used to manage the dynamic memory allocation.

Access to nonlocal names


1. lexical- or static scope rule
2. Dynamic scope rule

1. Compare between static, stack and heap allocation.

14 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Using the scope rules of Pascal, determine the declarations that apply to each occurrence of
the names a and b in the code segment below. The output of the program consists of the
integers 1 through 4.

program a(input, output);


procedure b(u, v, x, y: integer);
var a : record a, b : integer end;
b : record b, a : integer end;
begin
with a do begin a := u; b := v end;
with b do begin a := x; b := y end;
writeline(a.a, a.b, b.a, b.b)
end;
begin
b(1, 2, 3, 4)
end.

15 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. What is the output if the following C program, if the compiler uses dynamic scope? Briefly
justify your answer.
int r;
void write (void){
printf(“%d”, r);
}
void display(void){
int r = 37.24;
write();
}
main(){
r = 11.34;
write();
display();
}

16 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 4
Course: Compiler Design
Relevant MAKAUT syllabus portion: Parameter passing (call by value, call by reference,
copy restore, call by name)

Course Outcomes:

Lecture 4 (60 minutes)

Topics Covered: Parameter passing - call by value, call by


reference, copy restore, call by name.
Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand different parameter passing technique for procedure call.

Notes:
Parameter Passing
There are two types of parameters-
- Formal Parameter
- Actual Parameter
Based on these parameters there are various parameter passing methods, the most common
methods are:
1. call by value
2. call by reference
3. call by value-result
4. call by name

1. Why do we need parameter passing for procedure call?

17 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Write the output of the following C program. Assume following parameter passing methods
(ignore the parameter passing syntax).
i) call by value ii) call by reference
iii) call by value-result iv) call by name
int i;
int j;
void p(int x, int y){
x += 1;
i += 1;
y += 1;
}
void swap(int x, int y){
int a[2] = {1, 1};
int b[3] = {1, 2, 0};
p(a[i], a[i]);
printf(“%d, %d”, a[0], a[1]);
swap(j, a[j]);
printf(“%d, %d, %d”, b[0], b[1], b[2]);
return 0;
}

18 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

19 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 5
Course: Compiler Design
Relevant MAKAUT syllabus portion: Symbol tables, dynamic storage allocation
techniques.

Course Outcomes:

Lecture 5 (60 minutes)

Topics Covered: Symbol tables, dynamic storage allocation


techniques.

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand symbol table: use, construction and management.
2. To understand dynamic memory allocation.
Notes:
Symbol Tables
- A compiler uses a symbol table to keep track of scope and binding information about
names.
- The table is searched every time a name is encountered in source code.
- A symbol-table mechanism must allow us to add new entries and find existing entries
efficiently. We evaluate each scheme on basis of time required to add n entries and make e
enquires.
- A symbol-table mechanism must allow us to add new entries and find existing entries
efficiently.

How to store the names in symbol tables?


1. Fixed-length name
2. Variable-length name

Symbol Table Management


1. List data structure for symbol-table
2. Self organizing list
3. Hash tables

20 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

Dynamic Storage Allocation Techniques

 Explicit Allocation
o Explicit Allocation for Fixed Sized Blocks
o Explicit Allocation for Variable Sized Blocks
 Implicit Allocation

1. Why do we need Symbol Table.

21 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. How the symbol tables are managed?

22 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. Explain about implicit and explicit storage request.

23 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 6
Course: Compiler Design
Relevant MAKAUT syllabus portion: Intermediate languages, Graphical representation

Lecture 6 (60 minutes)

Topics Covered: Need of Intermediate Languages, Graphical


Representation: Syntax Tree, DAG, POSTFIX expression
Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand different graphical representation of intermediate language

Notes:
Intermediate Code Generation
In the analysis-synthesis model of a compiler, the front end translates a source program into
an intermediate representation from which the back end generates target code.
Benefits of machine-independent intermediate form are:
1. Retargeting is facilitated; a compiler for different machine can be created by attaching a
back end for the new machine to an existing front end.
2. A machine-independent code optimizer can be applied to the intermediate representation.
Graphical Representation
 A syntax tree depicts the natural hierarchical structure of source program.
 A dag gives the same information but in a more compact way because common sub
expressions are identified.
Syntax tree: represent constructs in the source program; the children of a node represent the
meaningful components of a constructor.
DAG (direct acyclic graph): identifies the common subexpressions (subexpressions that
occur more than once) of the expression.
• More compact than syntax tree.

24 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. Draw syntax tree for the following expression: a * - (b + c / d)

2. Draw a syntax tree and DAG for the expression: A = B * - C + B * - C.

25 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. Translate the arithmetic expression a * - (b + c) into syntax tree and postfix notation.

4. Design syntax tree and postfix notation for the following expression:
(a + (b * c)) ^ d – e / (f + g)

26 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 7
Course: Compiler Design
Relevant MAKAUT syllabus portion: Three-address code, Quadruples, Triples, Indirect
triples.

Lecture 7 (60 minutes)

Topics Covered: Three-address code; Types of Three-Address


Code;

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand three-address code representation for source code.
Notes
Three-Address Code
In three-address code, there is at most one operator on the right side of an instruction; that is,
no built-up arithmetic expressions are permitted.
x + y * z might be translated into the sequence of three-address instructions:
t1 = y * z
t2 = x + t 1
where ti and t2 are compiler-generated temporary names.
“Three-address code is a linearized representation of a syntax tree or a DAG in which explicit
names correspond to the interior nodes of the graph.”
Three-address instructions
 three-address instructions specifies the components of each type of instruction, but it
does not specify the representation of these instructions in a data structure.
 in a compiler, these instructions can be implemented as objects or as records with
fields for the operator and the operands.
 three such representations are called
 "quadruples,"

27 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

 "triples,"
 and "indirect triples."
Quadruples
Quadruple has four fields, which we call op, arg1, arg2, and result. The op field contains an
internal code for the operator.
Triples
Triple has only three fields, which we call op, arg1, and arg2
Indirect Triples
Indirect triples consist of a listing of pointers to triples, rather than a listing of triples
themselves. For example, let us use an array instruction to list pointers to triples in the desired
order.
1. Translate the following expression A = B * - C + B * - C into quadruple and triples
separately.

28 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Differentiate between quadruple, triples and indirect triples.

3. Explain the following terms with example given statements: – (a + b) * (c + d) + (a + b + c)


(i) Quadruple, (ii) Triples, (iii) Indirect Triples

29 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

4. Translate the arithmetic expression: a * (b + c /d) into


(i) syntax tree
(ii) postfix
(iii) 3-address code.

30 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

5. Distinguish between quadruples, triples and indirect triples for the expression.
x = y * −z + y * −z

31 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

6. Translate the expression


a = – ( a + b ) ∗ ( c + d + ( a + b + c )) into
i) Quadruple
ii) Triple
iii) Indirect Triple
iv) 3-address code.

32 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

33 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 8
Course: Compiler Design
Relevant MAKAUT syllabus portion: Three-address code

Course Outcomes:

Lecture 8 (60 minutes)

Topics Covered: An Overview of Implementation of Three-Address


Statements; Syntax-Directed Translation into Three-Address Code

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To get introduced to implementation of Three-Address Statements.

Notes:
Types of Three Address Statements
 Declarative Statements
 Assignment Statements
 Arrays
 Boolean Expression
 Flow Control Statement
 Case Statement
 Procedure Call

34 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. While the three-address code for the following C program :


main ( )
{
int x = 1;
int y[20];
while (x ≤ 20)
a[x] = 0;
}

35 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Consider the following code fragment. Generate the three address code for it.
switch(a + b){
case 1: x = x + 1;
case 2: y = y + 2;
case 3: z = z + 3;
default: c = c – 1;
}

36 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. Write syntax directed translation for the flow-of-control statement – i) if – then, ii) if-then-
else, iii) while, and iv) for using the translation, convert the following statement to three
address code.
if (x > 10) then
while (a > 10)
y=x+a
else if (y > 100)
y = 1;

37 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 9
Course: Compiler Design
Relevant MAKAUT syllabus portion: Three-address code
Lecture 9 (60 minutes)

Topics Covered: Backpatching

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand backpatching to use to generate code for Boolean expression and flow-
of-control statements in one pass.

Notes:
- “Backpatching is the activity of filling up unspecified information of labels using
appropriate semantic actions during code generation process.”
- Implementation of syntax directed definition using two passes is the most convenient
method.
- If we decide to generate the three address code for given syntax directed definition using
single pass only, then the main problem that occurs is the decision of addresses of the labels.
- The jump (goto) statements refer these label statements and in one pass it becomes difficult
to know the locations of the label statements.
- If we use two passes instead of one pass then in one pass we can leave these addresses
unspecified and in second pass this incomplete information can be filled up.
- To overcome the problem of processing the incomplete information in one pass the
backpatching technique can be used.

38 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. Why do we need Backpatching.

39 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Using Backpatching, generate an intermediate code for following expression.


A < B OR C < D AND P < Q

40 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 10
Course: Compiler Design
Relevant MAKAUT syllabus portion: The principal sources of optimization

Course Outcomes:

Lecture 10 (60 minutes)

Topics Covered: Introduction to Code Optimization, The principal


sources of optimization

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand the principal source of optimization in target code.
Notes
Principal Sources of Optimization
 The optimization can be done locally or globally. If the transformation is applied on the
same basic block then that kind of transformation is done locally otherwise
transformation is done globally.
Function preserving transformations
 There are a number of ways in which a compiler can improve a program without
changing the function it computes.
 Common subexpression elimination, copy propagation, dead-code elimination, and
constant folding are common examples of such function preserving transformation.
1. Compile Time Evaluation
1.1 Folding
1.2 Constant propagation
2. Common Sub Expression Elimination
3. Copy Propagation
4. Code Movement
Loop invariant computation
5. Strength Reduction
6. Dead Code Elimination

Loop Optimization
 Code optimization can be significantly done in loops of the program. Specially inner loop
is a place where program spends large amount of time.
 Hence, if number of instructions are less in inner loop then running time will get
decreased to a large extent.

41 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

The loop optimization is carried out by following methods:


1. Code motion
2. Induction variable and strength reduction
3. Loop invariant method
4. Loop unrolling
5. Loop fusion

1. What will happen if you don’t optimize your code?

2. What is meant by common sub expression? Explain the common sub expression
elimination technique with the help of suitable example.

42 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

43 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 11
Course: Compiler Design
Relevant MAKAUT syllabus portion: blocks & flow graphs

Lecture 11 (60 minutes)

Topics Covered: Basic Blocks and Flow Graphs

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand basic blocks and flow graphs.

Notes:
Basic Block and Flow Graph

 A graph representation of three-address statements, called a flow graph.


 A flow graph is useful to understanding code-generation algorithms, even if the graph is
not explicitly constructed by a code-generation algorithm.
 Nodes in the flow graph represent computations, and the edges represent the flow
control.
 Some register assignment algorithms use flow graphs to find the inner loops where a
program is expected to spend most of its time.

Algorithm: Partition into basic block


Input: A sequence of three-address statements.
Output: A list of basic blocks with each three-address statement in exactly one block.
Method:
1. We first determine the set of leaders. The first statement of basic block. The rules we use
are the following:
1. The first statement is the leader
2. Any statement that is the target of a conditional or unconditional goto is a leader.
3. Any statement that immediately follows a goto or conditional goto statement is a
leader.

44 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. For each leader, its basic block consists of leader and all statements up to but not including
the next leader or end of the program.

Flow Graph
 We can add the flow-of-control information to the set of basic blocks making up a
program by constructing a directed graph called flow graph.
 The nodes of the flow graph are the basic block.
 One node is distinguish as initial; it is the block whose leader is the first statement.
 There is a directed edge from block B1 to block B2 if B2 immediately follow B1 in some
execution sequence; that is, if
1. There is a conditional or unconditional jump from the last statement of B1 to the
first statement of B2, or
2. B2 immediately follows B1 in the order of the program, and B1 does not end in an
unconditional jump.
 We say that B1 is a predecessor of B2, and B2 is a successor of B1.

1. Draw the flow graph for the following code:


i) location = – 1
ii) i = 0
iii) i < 100 goto 5
iv) goto 13
v) t 1 = 4i
vi) t 2 = A [ t 1 ]
vii) if t 2 = x goto 9
viii) goto 10
ix) location = i
x) t 3 = i + 1
xi) i = t 3
xii) goto 3
xiii) ......

45 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Draw the flow graph for the following code:


i) sum := 0
ii) i = 0
iii) t1 := 4 * i
iv) t2 := a[t1]
v) t3 := sum + t2
vi) sum := t3
vii) t4 := I + 1
viii) i := t4
ix) if i <= 10 goto (iii)

46 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. For the following code segment construct flow graph.


sum := 0;
i := 1;
do
sum := prod + A[i] * B[i];
i := i + 1;
while i <= 20;

47 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 12
Course: Compiler Design
Relevant MAKAUT syllabus portion: Transformation of basic blocks
Lecture 12 (60 minutes)

Topics Covered: Transformation of basic blocks


Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand basic blocks optimization by different transformation techniques.
Notes
Transformation of Basic Blocks
 There are two important classes of local transformations that can be applied to basic
blocks:
1. Structure Preserving Transformation
a. Common subexpression elimination
b. Dead-code elimination
c. Renaming of temporary variables
d. Interchange of two independent adjacent statements
2. Algebraic Transformation

48 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. Consider some inter-block code optimization without any data flow analysis by treating each
extended basic block as if it is a basic block. Give algorithms to do the following optimizations
within an extended basic block. In each case, indicate what effect on other extended basic
blocks a change within one extended block can have.
i) Common sub-expression elimination
ii) Constant folding
iii) Copy propagation

49 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. a) Define basic block and flow graph.


b) Consider the following code :
(i) i = 12.
(ii) j = 1
(iii) t1 = 10 ∗ i
(iv) t2 = t1 + j
(v) t3 = 8 ∗ t2
(vi) t4 = t3 – 88
(vii) a [ t4 ] 0·0
(viii) j = j + 1
(ix) if j ≤ 10 goto (iii)
(x) i = i + 1
(xi) if i ≤ 10 goto (ii)
(xii) i = 1
(xiii) t5 = i – 1
(xiv) t6 = 88 ∗ 5
(xv) a [ t6 ] = 1·0
(xvi) i = i + 1
(xvii) it i ≤ 10 goto (xiii)
Find out the basic block and draw the flow graph for the above code.

50 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. Construct basic blocks and data flow graph and identify loop invariant statements:
for(i = 1 to n){
j = 1;
while(j <= n){
A = B * C / D;
j = j + 1;
}
}

51 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 13
Course: Compiler Design
Relevant MAKAUT syllabus portion: The DAG representation of basic blocks

Lecture 13 (60 minutes)

Topics Covered: The DAG representation of basic blocks

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand how to represent Basic Blocks using DAG.
Notes:
The DAG representation of basic blocks
 Directed Acyclic Graph (DAG) is a useful data structure for implementing transformation
on a basic block.
 A DAG gives a picture of how the value computed by each statement in a basic block is
used in subsequent statements of the basic block.
 Constructing a DAG form three-address statements is a very good way of determining
common subexpressions within block; determining which names are used inside the block
but evaluated outside the block, and determining which statements of the bock could have
their value used outside the block.
 A DAG for a basic bock is a directed acyclic graph with following labels on nodes :
1. Leaves are labelled by unique identifiers, either variable names or constants.
2. Interiors nodes are labelled by an operator symbol.
3. Nodes are also optionally given a sequence of identifiers for labels. The intention is
that interior nodes represent computed values, and the identifiers labelling a node are
deemed to have that value.

DAG construction
1. If the statement is in form x := y + z, we look for nodes that represent the “current” values
of y and z. Those could be leaves, or they could be interior nodes of the DAG if y and/or
z is already evaluated by previous statements of the block.
2. Then, we create node labelled + and give it two children y (left child) and z (right child).
3. However, if there is already a node denoting same value as y + z, we do not add the new
node to the DAG, but rather give the existing node the additional label x.
4. If x (not x0) had previously labelled some other node, we remove that label, since the
“current” value of x is the node just created.
5. For an assignment such as x := y we do not create a new node. Rather, we append label x
to the list of names on the node for the “current” value of y.

52 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

Algorithm (Constructing a DAG)


Input. A basic block.
Output. A DAG for the basic block containing the following information:
1. A label for each node. For leaves the label is an identifier (constants permitted), and for
interior nodes, an operator symbol.
2. For each node a (possibly empty) list of attached identifiers (constants not permitted here).

Method. Suppose the “current” three-address statement is either (i) x := y op z, (ii) x := op y,


or (iii) x := y. We treat a relational operator like if i <= 20 goto as case (i), with x undefined.
The DAG construction process is to do the following steps (1) through (3) for each statement
of the block, in turn.

Initially, we assume there are no nodes, and node is identified for all arguments.
1. If node(y) is undefined, create a leaf labeled y, and let node(y) be this node. In case (i), if
node(z) is undefined, create a leaf labeled z and let that leaf be node(z).
2. In case (i), determine if there is a node labeled as op, whose left child is node(y) and right
child is node(z). If not, create such node. In either event, let n be the node found or
created. In case (ii), determine whether there is a node labeled op, whose lone child is
node(y). If not, create such a node, and let n be the node found or created. In case (iii), let
n be node(y).
3. Delete x from the list of attached identifiers for node(x). Append x to the list attached
identifiers for node n found in (2) and set node(x) to n.

1. Draw the DAG for the following basic block: d := b * c


e := a + b
b := b * c
a := e - d

53 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Generate DAG representation of the following code and list out the applications of DAG
representation:

i = 1, s = 0;
while(i < 10){
s = s + a[i][j];
i = i + 1;
}

54 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. Construct DAG for the following basic block:


T1 = A + B
T2 = C + D
T3 = E – T2
T4 = T1 – T2

55 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 14
Course: Compiler Design
Relevant MAKAUT syllabus portion: Loops in Flow Graph
Course Outcomes:

Lecture 14 (60 minutes)

Topics Covered: Loops in Flow Graph: Dominators, Inner loop,


Reducible Flow Graphs
Prerequisites: Have you Read
 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand what constitute a loop in a flow-graph.

Notes:
Dominators
 We say node d of a flow graph dominates node n, written as d dom n, if every path from
initial code of flow graph to n goes through d.
 Under this definition, every node dominates itself, and the entry of a loop dominates all
the nodes in the loop.
 A useful way of presenting dominator information is in a tree, called the dominator tree,
in which the initial node is the root, and each node d dominates only its descendents in
the tree.
 The existence of dominator trees follows from a property of dominators; each node n has
a unique immediate dominator m that is the last dominator of n on any path from initial
node to n. In terms of the dom relation, the immediate dominator m has that property that
if d != n and d dom n, then d dom m.

Natural Loops
One important application of dominator information is in determining the loops of a flow
graph suitable for improvement, There are two essential properties of such loops.
1. A loop must have a single entry point, called the "header," This entry
point dominates all nodes in the loop, or it would not be the sole entry to the loop.
2. There must be at least one way to iterate the loop, i.e., at least one path back to the header.

Inner Loops
 A natural notation of inner loop: one that contains no other loops.
 When two loops have the header as shown in below, it is hard to tell which is inner loop.

Pre-header

56 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

 Several transformations require us to move statements “before the header.”


 We therefore begin treatment of a loop L by creating a new block, called the preheader.
 The preheader has only the header as successor, and all edges which formerly entered the
header L from outside L instead enter the preheader. Edges from inside loop L to the
header are not changed.
 Initially, the preheader is empty, but transformations on L may place statements in it.

Reducible flow graph


A flow graph G is reducible if and only if we can partition the edges into two disjoint group,
often called the forward edges and backward edges.
The properties are as following:
1. The forward edges from acyclic graph in which every node can be reached from the
initial node of G.
2. The back edges consist only of edges whose heads dominates their tail.
 Flow graph that occur in practice frequently fall into the class of reducible flow graph.
 Exclusive use of structured flow-of-control statements such as if-ten-else, while-do,
continue, and break statements produces programs whose flow graphs are always
reducible.
 Even programs written using goto statements by programmers with no prior
knowledge of structured program design are almost always reducible.

1. What are the sources of redundancy in code? Give examples using flow graphs.

57 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. When is a flow graph said to be reducible? What are the properties of natural loops?

58 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. What is significance of loop optimization technique.

59 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

4. For the given flow graph:

i) Compute the dominator relation.


ii) Find the immediate dominator of each node.
iii) Construct the dominator tree.
iv) Find the depth first ordering of the graph.
v) Is the flow graph is reducible?
vi) Find the natural loops of the flow graph.

60 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

61 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 15
Course: Compiler Design
Relevant MAKAUT syllabus portion: Peephole optimization

Course Outcomes:

Lecture 15 (60 minutes)

Topics Covered: Peephole optimization


Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To optimize the target program using Peephole optimization technique.

Notes
Peephole Optimization
 A statement-by-statement code-generation strategy often produces target code that
contains redundant instructions and suboptimal constructs. The quality of such target code
can be improve by applying “optimizing” transformations to the target program.
 Peephole optimization – effective technique for locally improving the target code.
 It examine a short sequence of target instructions (called peephole) and replacing these
instructions by a shorter or faster sequence whenever possible.
 The technique can also be applied directly after code generation to improve intermediate
representation.
 Peephole is a small. Moving window on the target code.
 It is characteristic of peephole optimization that each improvement may spawn
opportunities for additional improvement. In general, repeated passes over the target code
are necessary to get maximum benefit.
Transformations that are characteristic of Peephole Optimization
 redundant-instruction elimination
 flow-of-control optimizations
 algebraic simplifications
 use of machine idioms

62 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. Why do we use Peephole Optimization technique.

63 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 16
Course: Compiler Design
Relevant MAKAUT syllabus portion: Issues in the design of code generator

Lecture 16 (60 minutes)

Topics Covered: Issues in the Design of Code Generator

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand concepts of code generation and issues in design of Code Generator.

Notes:

Code Generator
It takes as input an intermediate representation of the source code and produces as output an
equivalent target program.
Code generator is a process of creating assembly language / machine language statements
which will perform operations specified by soource program when they run.
Properties:
 Correctness.
 High Quality.
 Efficient use of resources of the target machine.
 Quick code generation.

Issues in Design of Code Generator


1. Input to the code generator:
2. Target programs
3. Memory management:
4. Instruction selection:
5. Register allocation:
6. Choice of evaluation order:
7. Approaches to code generation

64 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. Briefly explain the issues with code generator.

65 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. What are various modes used during the code generation?

3. Compute the cost of following set of instructions.


MOV *R1, *R0
ADD *R2, *R0

66 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 17
Course: Compiler Design
Relevant MAKAUT syllabus portion: A simple code generator

Course Outcomes:

Lecture 17 (60 minutes)

Topics Covered: Next-Use Information, A Simple Code Generator

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand construction of a simple code generator.

Notes:
Next-Use Information
• Next-use information is needed for dead-code elimination and register assignment.
• Next-use is computed by backward scan of a block and performing the following
actions on statements:
Algorithm
i : x := y op z
- add liveness / next-use info on x, y and z to statement i
- set x to “not live” and “no next use”
- set y and z to “live” and next-use of y and z to i.

Simple Code Generator


• Generates target code for a sequence of three-address statements
- Next-use information is used
• For each operator in a statement there is a target-language operator (opcode).
• Uses new function getreg to assign registers to variables.
• Computed results are kept in registers as long as possible, which means:
- Results is needed in another computation
- Register kept up to a procedure call or end of block to avoid errors.
- Checks if operands to three-address-code are available in registers

Data structure used


• Register Descriptor – used to keep track of which variable is currently stored in a
register at a particular point in code.

67 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

• e.g. a local variable, argument, global variable etc.


MOV a, R0 “R0 contains a”
• Address Descriptor – used to keep track of the locations where the current value of
the variable can be found at run time.
• e.g. a local variable, argument, global variable etc.
MOV a, R0
MOV R0, R1 “a in R0 and R1”

The Code Generation Algorithm


Input: Sequence of 3-address statements from basic block. For each statement x := y op z
• Set location L = getreg(y, z) to store the result of y op z
• If y ɇ L, then generate
MOV y’, L
Where y’ denotes one of the locations where the value of y is available – choose register if
possible
• Generate the instruction OP z’, L where z’ is a current location of z.
• If the current values of y and/or z have no next uses, are not live on exit from the block,
and are in registers, alter the register descriptors to indicate that, after execution of x := y
op z, those registers no longer will contain y and/or z, respectively.

getreg() algorithm
1. If y is store in a register R and R only holds the value y, and y has no next use, then
return R;
Update address descriptor: value y no value in R
2. Else, return a new empty register if available
3. Else, find and occupied register R;
Store content (register spill) by generating
MOV R, M
for every M in address descriptor of y;
return register R
4. Return a memory location

1. Write and explain computing next uses algorithm.

68 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Write the simple code generation algorithm.

69 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

3. Generate code for the following C statement for target machine assuming all variables are
static.
x = a / (b + c) – d * (e + f)

70 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

4. Generate code for the following C program


main(){
int i;
int a[10];
while(i <= 10)
a[i] = 0;
}

71 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

DAY 18
Course: Compiler Design
Relevant MAKAUT syllabus portion: Register allocation & assignment

Course Outcomes:

Lecture 16 (60 minutes)

Topics Covered: Register allocation & assignment

Prerequisites: Have you Read

 CS 201 (Basic Computation and Principles of C)

 CS302 (Data Structure & Algorithm)

Objectives:
1. To understand register allocation and assignment during code generation.

Notes:
• Global register allocation assigns variables to limited number of available registers and
attempts to keep these registers consistent across the basic bock boundaries
• Suppose loading a variable x has a cost of 2
• Suppose storing a variable x has a cost of 2
• Benefit of allocating a register to a variable x within loop L is
∑block B in L (use(x, B) + 2 live(x, B)
where use(x, B) is the number of times x is used in B and live(x, B) = true if x is live on
exit from B

Global Register Allocation – Graph Coloring


 When a register is needed but all available registers are in use, the content of one of the
used registers must be stored to free a register – Spilling
 Graph coloring allocates registers and attempts to minimize the cost of spills
 Build a interference graph based on how variable interference with each other
 Find a k-coloring for the graph, with k the number of register

Register interference graph


 Nodes are symbolic registers
 Edge connects two nodes if one is live at a point where other is defined.
Keeping variables in registers in loops can be beneficial

72 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

1. What are the uses of register and address descriptors in code generation?

73 | P a g e Study Material, IEM, IT Department


Paper Name: Compiler Design Paper Code: IT605C

2. Generate machine code for the following instruction :


x=a/–(b*c)–d
Assume 3 registers are available.

3. For the following expression obtain optimal code using


i) only two registers, ii) only one register
(a + b) – (c – (d + e))

74 | P a g e Study Material, IEM, IT Department

You might also like