You are on page 1of 47

CS8602

Compiler Design

Prof. SAM 1
UNIT I: INTRODUCTION TO COMPILERS 9 CS8602 Syllabus Compiler Design
Structure of a compiler – Lexical Analysis – Role of Lexical Analyzer – Input Buffering – Specification of
Tokens – Recognition of Tokens – Lex – Finite Automata – Regular Expressions to Automata –
Minimizing DFA.

UNIT II: SYNTAX ANALYSIS 12 CS8602 Syllabus Compiler Design


Role of Parser – Grammars – Error Handling – Context-free grammars – Writing a grammar – Top Down
Parsing – General Strategies Recursive Descent Parser Predictive Parser-LL(1) Parser-Shift Reduce Parser-
LR Parser-LR (0)Item Construction of SLR Parsing Table -Introduction to LALR Parser – Error Handling
and Recovery in Syntax Analyzer-YACC.

UNIT III: INTERMEDIATE CODE GENERATION 8 CS8602 Syllabus Compiler Design


Syntax Directed Definitions, Evaluation Orders for Syntax Directed Definitions, Intermediate Languages:
Syntax Tree, Three Address Code, Types and Declarations, Translation of Expressions, Type Checking.

UNIT IV: RUN-TIME ENVIRONMENT AND CODE GENERATION 8 CS8602 Syllabus Compiler
Design
Storage Organization, Stack Allocation Space, Access to Non-local Data on the Stack, Heap Management –
Issues in Code Generation – Design of a simple Code Generator.

UNIT V: CODE OPTIMIZATION 8 CS8602 Syllabus Compiler Design


Principal Sources of Optimization – Peep-hole optimization – DAG- Optimization of Basic Blocks-Global
Data Flow Analysis – Efficient Data Flow Algorithm.

Prof. SAM 2
Test Book

Compilers
: Principles, Techniques, and Tools (2nd Editio
n)
,
by
Alfred V. Aho, Monica S. Lam, Ravi Sethi,
Jeffrey D. Ullman.

Prof. SAM 3
Examples for Translators & Compilers
Translators
• Interpreters
• Assemblers
• Compilers
Compilers
• Cross Compilers
• One pass Compilers
• Multi pass Compilers
• Source to Source Compilers
• Silicon Compilers

Prof. SAM 4
Compilers
• “Compilation”
– Translation of a program written in a source
language into a semantically equivalent program
written in a target language
Input

Source Target
Compiler
Program Program

Error messages Output


Prof. SAM 5
Interpreters
• “Interpretation”
– Performing the operations implied by the source
program

Source
Program
Interpreter Output
Input

Error messages

Prof. SAM 6
Cross Compilers
• A cross compiler is a compiler capable of
creating executable code for a platform other
than the one on which the compiler is
running.

For example, a compiler that runs on a 


Windows 7 PC but generates code that runs
on Android smartphone is a cross compiler.

Prof. SAM 7
Preprocessors, Compilers, Assemblers,
and Linkers
Skeletal Source Program

Preprocessor
Source Program
Try for example:
Compiler gcc -S myprog.c
Target Assembly Program javap Class

Assembler
Relocatable Object Code
Linker Libraries and
Relocatable Object Files
Absolute Machine
Prof. SAM
Code 8
Analysis of Source Programs
source program

Linear Analysis - Scanner lexical analyzer


tokens
Hierarchical Analysis - Parser
syntax analyzer
parse trees
Semantic Analysis
semantic analyzer
parse trees

Prof. SAM 9
Lexical Analysis

tokens

Prof. SAM 10
Syntax Analysis

parse tree

Prof. SAM 11
Semantic Analysis
type checking
type conversion

Prof. SAM 12
Symbol Table

• There is a record for each identifier


• The attributes include name, type, location, etc.

Prof. SAM 13
Tools for Analysis: Examples
• Structure Analysis – Automatic structure
• Pretty Printers – Analyze and prints comment
lines in special font
• Static Checkers – find potential bugs without
running a program
• Interpreters – performs operation implied by
the source program

Prof. SAM 14
Synthesis of Object Code
parse tree & symbol table

intermediate code generator


intermediate code

code optimizer
optimized intermediate code
code generator
target program
Prof. SAM 15
Intermediate Code Generation

Prof. SAM 16
Code Optimization

Prof. SAM 17
Code Generation

Prof. SAM 18
Model of A Compiler
• A compiler must perform two tasks:
- analysis of source program: The analysis part breaks up
the source program into constituent pieces and imposes a
grammatical structure on them. It then uses this structure to
create an intermediate representation of the source program.
- synthesis of its corresponding program: constructs the
desired target program from the intermediate representation
and the information in the symbol table.
• The analysis part is often called the front end of the
compiler; the synthesis part is the back end.

Prof. SAM 19
Phases of a Compiler
• Compiler operates in phases
• Each phase transforms source program from
one representation to another

Prof. SAM 20
Phases of the Compiler
Source Program

1
Lexical Analyzer

2
Syntax Analyzer

3
Semantic Analyzer

Symbol-table Error Handler


Manager
4 Intermediate
Code Generator

5
Code Optimizer

6
Code Generator

TargetProf.
Program
SAM 21
Phases of Modern Compiler

Error
handler

Prof. SAM 22
Lexical Analysis (scanner): The first phase of a compiler
 Reads the characters in the source program and groups them into
tokens.
 Token represents an Identifier, or a keyword, a punctuation
character or a operator.
 The character Sequence forming a token is called the lexeme for
the token.
 The lexical Analyzer not only generate tokens but also it enters the
lexeme into the symbol table
(token-name, attribute-value)
• Token-name: an abstract symbol is used during syntax analysis, an
• attribute-value: points to an entry in the symbol table for this
token.

Prof. SAM 23
Example: position =initial + rate * 60

1.”position” is a lexeme mapped into a token (id, 1), where id is an abstract symbol
standing for identifier and 1 points to the symbol table entry for position. The
symbol-table entry for an identifier holds information about the identifier, such as its
name and type.
2. = is a lexeme that is mapped into the token (=). Since this token needs no attribute-
value, we have omitted the second component. For notational convenience, the
lexeme itself is used as the name of the abstract symbol.
3. “initial” is a lexeme that is mapped into the token (id, 2), where 2 points to the
symbol-table entry for initial.
4. + is a lexeme that is mapped into the token (+).
5. “rate” is a lexeme mapped into the token (id, 3), where 3 points to the symbol-table
entry for rate.
6. * is a lexeme that is mapped into the token (*) .
7. 60 is a lexeme that is mapped into the token (60)
Blanks separating the lexemes would be discarded by the lexical analyzer.

Prof. SAM 24
Syntax Analysis (parser) : The second phase of the compiler

• The parser uses the first components of the tokens produced by the lexical
analyzer to create a tree-like intermediate representation that depicts the
grammatical structure of the token stream.
• A typical representation is a syntax tree in which each interior node
represents an operation and the children of the node represent the
arguments of the operation

Prof. SAM 25
Syntax Analysis Example

Pay := Base + Rate* 60


 The seven tokens are grouped into a parse tree

Assignment stmt

identifier := expression

pay expression expression


+
identifier
Rate*60
base
Prof. SAM 26
Semantic Analysis: Third phase of the compiler

 It uses the syntax tree and the information in the symbol table
to check the source program for semantic consistency with the
language definition.
 It gathers type information and saves it in either the syntax
tree or the symbol table, for subsequent use during
intermediate-code generation.
 An important part of semantic analysis is type checking,
where the compiler checks that each operator has matching
operands.
 For example, many programming language definitions require an array
index to be an integer; the compiler must report an error if a floating-
point number is used to index an array.
 The language specification may permit some type conversions
called coercions. For example, a binary arithmetic operator
may be applied to either a pair of integers or to a pair of
floating-point numbers.

Prof. SAM 27
Intermediate Code Generation: three-address code

After syntax and semantic analysis of the source program, many compilers
generate an explicit low-level or machine-like intermediate representation
(a program for an abstract machine).
This intermediate representation should have two important properties:
– it should be easy to produce and
– it should be easy to translate into the target machine.
Intermediate representations are of the form,
syntax tree, TAC and Postfix notation.
The considered intermediate form called three-address code, which consists
of a sequence of assembly-like instructions with three operands per
instruction. Each operand can act like a register.

Prof. SAM 28
Code Optimization: to generate better target code

• The machine-independent code-optimization phase attempts to improve the


intermediate code so that better target code will result.
• Usually better means:
– faster, shorter code, or target code that consumes less power.
• The optimizer can deduce that the conversion of 60 from integer to floating point
can be done once and for all at compile time, so the int to float operation can be
eliminated by replacing the integer 60 by the floating-point number 60.0.
Moreover, t3 is used only once

• There are simple optimizations that significantly improve the running time of the
target program without slowing down compilation too much .
• It involves:
• - Detection and removal of dead code.
- Calculation of constant expressions and terms.
- Moving code outside of loops.
- Removal of unnecessary temporary variables.
Prof. SAM 29
Code Generation
• It takes as input an intermediate representation of the source program and
maps it into the target language
• If the target language is machine, code, registers or memory locations are
selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of
machine instructions that perform the same task.
• A crucial aspect of code generation is the careful assignment of registers
to hold variables.
• This involves:
– Allocation of Registers and Memory.
– Generation of correct References.
– Generation of correct types.
– Generation of machine code.

Prof. SAM 30
Symbol Table
• It is a data structure containing a record for each identifier, with fields for
the attributes of the identifier.
• The data structure allows us to find the record for each identifier quickly
and to store or retrieve data from that record quickly.

• When an identifier in the source program is detected by the lexical


Analyzer, that identifier is entered into the symbol table with the attributes.

• These attributes may provide information about the storage allocated for a
name, its type, its scope (where in the program its value may be used).
• In the case of procedure names, such things as the number and types of its
arguments, the method of passing each argument (for example, by value or
by reference), and the type returned.

Prof. SAM 31
Error Detection and Reporting
• Each phase can encounter some errors.
• After detecting an error, a phase must deal with that error, so that
compilation can proceed allowing further errors in the source program to
be detected.
Eg:
i) Lexical Errors : (Don’t form any tokens)
“in, floa, switc etc.,”
ii) Syntax Errors: (Token stream violates the structure rules of the
language).
“ Missing of parenthesis, braces etc.,”

iii) Semantic Errors:(No meaning to the operation involved)


“a is not used”;

Prof. SAM 32
Example
Consider the following statement position=initial
+ rate*60
Show the output of each phase.

Prof. SAM 33
Example
position := initial + rate * 60

lexical analyzer
id1 := id2 + id3 * 60
syntax analyzer
:=
id1 +
id2 *
id3 60
semantic analyzer
:=
Symbol + E
Table
id1 r
id2l *
r
position .... id3 inttoreal o
60 r
initial ….
s
intermediate code generator
rate…. Prof. SAM 34
Example
Symbol Table E
r
position ....
r
initial …. o
intermediate code generator r
rate….
temp1 := inttoreal(60) s
temp2 := id3 * temp1
temp3 := id2 + temp2 3 address code
id1 := temp3
code optimizer
temp1 := id3 * 50.0
id1 := id2 + temp1
final code generator
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVFProf.
R1,SAM
id1 35
Grouping of Compiler Phases

• Front end
 Consist of those phases that depend on the source language but
largely independent of the target machine.
• Back end
 Consist of those phases that are usually target machine dependent
such as optimization and code generation.

Prof. SAM 36
• Example 2:

Prof. SAM 37
Lexical Analysis
Lexical Analysis

Syntax Analysis
while (y < z) {
Semantic Analysis
int x = a + b;
y += x; IR Generation

} IR Optimization

Code Generation

Code Optimization

Prof. SAM 38
Prof. SAM 39
Outcome of Lexical Analyzer

Prof. SAM 40
• Groups the tokens into Syntactic Structures.

Prof. SAM 41
 Ensures that the components of a program fit together meaningfully
 Gathers type information and checks for type compatibility
Prof. SAM 42
• Simple Instructions produced by the
syntax Analyzer is IR.

• IR has two properties: Easy to use, Easy


To translate.

• Eg: TAC

Prof. SAM 43
• Improve the Intermediate Code so that
the ultimate object program runs faster
and or takes less space.

Prof. SAM 44
Machine code is generated.

Prof. SAM 45
Prof. SAM 46
The text book

Alfred V. Aho, Ravi Sethi and Jeffry D. Ulman,


Compilers Principles, Techniques and Tools, Addison
Wesley, 2007 second edition, pearson

Prof. SAM 47

You might also like