Professional Documents
Culture Documents
Compiler Design
xxxxxxxxx 9x
xxxxxxxxx 9x
xxxxxxxxx 9x
xxx 3b 3x 3 9x
xxx 3b 3x 6 3b 3x
xxx 3b 3x 3 9x
xxx 3b 3x
xxx 3b 3x
xxx 3b 3x
xxxxxxxxx 9x
xxxxxxxxx 9x
xxxxxxxxx 9x
Compilers
• What is a compiler?
▫ A program that translates an executable program in one language
into an executable program in another language
▫ The compiler should improve the program, in some way
• What is an interpreter?
▫ A program that reads an executable program and produces the
results of executing that program
• Translate in steps
• Lexical Analysis
• Syntax Analysis
• Semantic Analysis
• Optimization
• Code Generation
Lexical Analysis
If a == b then a = 1 ; else a = 2 ;
- Sequence of words (total ? words)
Lexical Analysis
• Lexical analysis is the process of identifying
the words from an input string of characters,
which may be handled more easily by a parser.
• These words must be separated by some
predefined delimiter or there may be some rules
imposed by the language for breaking the
sentence into tokens or words which are then
passed on to the next phase of syntax analysis.
• In programming languages, a character from a
different class may also be considered as a word
separator.
Syntax Analysis
• Understand sentence structure
• Parsing = Diagramming sentences
▫ The diagram is a tree
Parsing
Parsing
Semantic Analysis
• Understanding the meaning of the sentence
• Too hard for compilers.
• However, compilers do perform analysis to catch
inconsistencies
Semantic Analysis
• Jack said Jerry left his assignment at home
{
int Jack = 3;
{
int Jack = 4;
cout << Jack;
}
}
Semantic Analysis
• Compilers perform many other checks besides
variable bindings
• Type checking:
▫ Jack left her work at home
▫ There is a type mismatch between her and Jack.
• In the statement:
double y = "Hello World";
• Semantic analysis would reveal that "Hello
World" is a string, and y is of type double,
• This is a type mismatch.
Semantic Analysis
• Semantic analysis is the process of examining
the statements and to make sure that they make
sense.
• During the semantic analysis, the types, values,
and other required information about
statements are recorded, checked, and
transformed appropriately to make sure the
program makes sense.
• Ideally there should be no ambiguity in the
grammar of the language. Each sentence should
have just one meaning.
Optimization
• Automatically modify programs so that they
▫ Run faster
▫ Use less resources (memory, registers, space,
fewer fetches etc.)
• Example: x = 15 * 3 is transformed to x = 45
Optimization
PI = 3.14159 3A+4M+1D+2E
Area = 4 * PI * R^2
Volume = (4/3) * PI * R^3
--------------------------------
X = 3.14159 * R * R 3A+5M
Area = 4 * X
Volume = 1.33 * X * R
--------------------------------
Area = 4 * 3.14159 * R * R
2A+4M+1D
Volume = ( Area / 3 ) * R
--------------------------------
Area = 12.56636 * R * R 2A+3M+1D
Volume = ( Area /3 ) * R
--------------------------------
X=R*R 3A+4M
Area=12.56636*X
Volume=4.1783147*X*R (4/3)*PI = 4.1783147
A : assignment M : multiplication
D : division E : exponent
Optimization
int x = 2;
int y = 3;
int *array[5];
for (i=0; i<5;i++)
*array[i] = x + y;
____________________________________
int x = 2;
int y = 3;
int z = x + y;
int *array[5];
for (i=0; i<5;i++)
*array[i] = z;
Code Generation
• A translation into another language
▫ Similar to human translation
• Usually produces assembly code
Source Code
Intermediate Language should be:
Easy to Produce
Intermediate Code Easy to Translate into Target Language
Target Code
An Observation
• The overall structure of every compiler adheres
to this outline
• Proportions have changed since the first
compiler was written for Fortran
L P S O CG
L P S O CG
How to translate?
• Translate in steps. Each step handles a
reasonably simple, logical, and well defined task
• Design a series of program representations
• Intermediate representations should be
amenable to program manipulation of various
kinds (type checking, optimization, code
generation etc.)
• Representations become more machine specific
and less language specific as the translation
proceeds
How to Translate?
• Many modern compilers share a common 'two stage'
design.
▫ The "front end" translates the source language or the high level
program into an intermediate representation.
▫ The second stage is the "back end", which works with the internal
representation to produce code in the output language which is a
low level code.
• The higher the abstraction a compiler can support, the
better it is.
Structure of a Compiler
Structure of a Compiler
Structure of a Compiler
• Also known as Analysis-Synthesis model of
compilation
▫ Front end phases are known as analysis phases
▫ Back end phases are known as synthesis phases
• Each phase has a well defined work
• Each phase handles a logical activity in the
process of compilation
Advantages of the Model
• Compiler is retargetable
▫ Since each phase handles a logically different phase of
working of a compiler, parts of the code can be reused to
make new compilers.
• Source and machine independent code optimization is
possible.
• Optimization phase can be inserted after the front and
back end phases have been developed and deployed
• In adding optimization, improving the performance of
one phase should not affect the same of the other
phase; this is possible to achieve in this model.
M*N vs M+N problem
For M languages and N machines we need to
develop M*N compilers
M*N vs M+N problem
• We design the front end independent of machines and the
back end independent of the source language.
• We require a Universal Intermediate Language (UIL) that acts
as an interface between front end and back end.
• Thus we need to design only M front ends and N back ends.
• To design a compiler for language L that produces output for
machine C, we take the front end for L and the back end for C.
In this way, we require only M + N compilers for M source
languages and N machine architectures.
• For large M and N, this is a significant reduction in the effort.
Universal Intermediate Language
• Universal Computer/Compiler Oriented Language
(UNCOL)
• Suggested in 1958 to reduce the developmental effort of
compiling many different languages to different
architectures
• Due to vast differences between programming languages
and machine architectures, design of such a language is
not possible.
• We can group programming languages with similar
characteristics together
• Similarly an intermediate language is designed for
similar machines.
How to reduce development and
testing effort?
• DO NOT WRITE COMPILERS, GENERATE
compilers
• A compiler generator should be able to
"generate" compiler from the source language
and target machine specification
Advantages
• Changing specifications of a phase can lead to a new
compiler
▫ If machine specifications are changed then compiler can
generate code for a different machine without changing any
other phase
▫ If front end specifications are changed then we can get
compiler for a new language
• Tool based compiler development cuts down
development/maintenance time by almost 30-40%
• Tool development/testing is one time effort
• Compiler performance can be improved by improving a
tool and/or specification for a particular phase
Types of Compilers – Native vs Cross
J J Py Py
C M C M
A A C C
M
Bootstrapping
Bootstrapping
• Develop a compiler for a language L written in L. For this we require
a compiler of L that runs on machine M and outputs code for
machine M.
• First we write LLN i.e. we have a compiler written in L that converts
code written in L to code that can run on machine N.
• We then compile this compiler program written in L on the available
compiler LMM. So, we get a compiler program that can run on
machine M and convert code written in L to code that can run on
machine N i.e. we get LMN.
• Now, we again compile the original written compiler LLN on this
new compiler LMN we got in last step. This compilation will convert
the compiler code written in L to code that can run on machine N.
• So, we finally have a compiler code that can run on machine N and
converts code in language L to code that will run on machine N. i.e.
we get LNN.
The Economy of Languages
• Why are there so many programming languages?
▫ Application domains have distinctive/ conflicting
requirements
• Why are there new programming languages?
• What is a good programming language?
Why are there new programming
languages?
• Claim: Programmer cost is the dominant cost for
a programming language.
• Predictions
▫ Widely used languages are slow to change
▫ Easy to start a new language
▫ Languages are created/ evolve to fill up a void
• New programming languages tend to look like
existing ones
Role of Programming Languages
• Getting the answer
• Correctness/Precise-ness
• Efficiency
• User friendliness
Influences on Evolution of Language
Design
• Computer Capabilities
• Applications
Commercial, Military, Scientific, Medical, Astronomical,
Business, Industrial, Personal (Games)
• Programming Methods
• Implementation Methods
• Theoretical Studies
• Standardization
What is a good programming language?
• There is no universally accepted metric for
language design
• A good language is one that most people use?
Characteristics of a Good Languages
• Clarity, Simplicity and Unity
▫ Unified set of concepts that can be used as primitives for
developing an algorithm.
• Orthogonality
▫ Attribute of being able to combine various features of a language in
all possible combinations, with each combination being meaningful.
• Naturalness for the Application
• Support for abstraction
• Ease of program verification
• Programming environment
• Portability of the programs
• Cost
Cost Explained
• Cost
▫ of training
▫ of writing the program
▫ of executing the program
▫ of translation
▫ of implementation
▫ of maintenance
Thank you!