Compiler design
Text Book: Compilers:
Aho, Sethi, and Ullman.
Topics:
Compiler phases
Lexical analysis
Syntax analysis
Code generation
Home work: there will be two major programming
assignments. They must be done independently.
Examinations: there will be two hourly exams and a final
exam.
Grading: the total grade will be computed as follows:
O 20% for each hourly exam
O 15% for the home work
O 50% for the final exam.
ciples, theory, and techniques by
FH | (a ee oeOverview of Compiler
O Compiler is a program (written in a
high-level language) that converts /
translates / compiles source program
written in a high level language into
an equivalent machine code.
compiler
source program ——————__ machine code
or object codeWhat is a Compiler?
O Definition: A compiler is a program
that translates one language to
another
O Usually, the translation takes place
between a high-leve! language and
a low-level language
O Clearly, our first step is to discuss
some terminology...Terminology
O Source language - the language that
is being translated
O Object language - the language into
which the translation is being done
O High-level language — a language that
is far removed from a computer; one
which is close to the problem area(s)
for which the language is designedTerminology...
O Low-leve! language - a language that
is close to the machine (computer)
upon which the language will run
(execute)
O Object language - (sometimes called
machine code) the language of some
computer. This language usually is
not human readable (and is
expressed in bits or hex)Terminology...
O Intermediate language - a language that
is used either:
m@ because it is a temporary step in the
translation process; or,
™ because it is neither particularly, high, nor
low, and is the output of a translation
O Assembly language — a language that
translates almost one-to-one to machine
language, but is in human readable formWhat's a Compiler?...
O Today, compilers are written using high-
level languages (such as Java, C++, etc.)
O The earliest compilers were written using
assembly language (e.g., FORTRAN and
COBOL around 1954)
O Sometimes a compiler is written in the
same language for which one is writing a
compiler. This is done through
Bootstrapping.Why Should I learn Compiler
Construction?
OHow do compilers work?
OHow do computers work? (instruction set,
registers, addressing modes, run time data
structures, ...)
OWhat machine code is generated for certain
language constructs? (efficiency
considerations)
OGetting "a feeling" for good language designWhy Compilers? A Brief History
O The first computers were “hard-
wired”
O That is, they were collections of
physical devices that connected to
one-another, in an assemblage
designed to calculate particular kinds
of resultsWhy Compilers? A Brief
History...
U1 For example, Babbage’s Analytic
Engine and his Difference Engine
were assemblages of gears that
solved numeric problems
O The primary driving force was the
calculation of ballistics tables for
artillery
O Jacquard’s loom is another example
O And Holleriths’ work for the US
Census bureau is anotherWhy Compilers? A Brief
History...
O In the late 1940’s John von Neumann
“invented” the stored program
computer
O The “invention” is the observation
that just as you can store data in the
memory of a computer, the data can
be machine instructions
O Then the computer can not only take
its instructions from memory...Why Compilers? A Brief
History...
O But the computer can modify the
instructions in its memory...
O And, in fact, can write its own
programs, storing them in memory
O It quickly became apparent that the
simplest way to store information in a
computer was in the form of binary
numbersWhy Compilers? A Brief
History...
O So, to program a computer, you only
needed to enter a sequence of binary
numbers into memory, and then tell
the computer at which memory
address to start execution
O This was programming in machine
language
O Instructions (and data) were entered
from a console, one word (in binary)
at a time...Why Compilers? A Brief
History...
O This form of coding (note the word!)
quickly was replaced by programming
in assembly language
O A program was written (in machine
language) which translated assembly
language to machine language (called
an assembler)Why Compilers? A Brief
History...
O After the first assembler was written,
no one needed to code in machine
language any longer
O But, coding can take many
instructions...
O So, the thought was - can we create
a program that translates something
like into assembly language
or into machine language?Why Compilers? A Brief History.
Formal Languages
O About the same time, in the mid-
1950’s, Noam Chomsky (M.I.T.)
began investigating the formal
structure of natural languages
O His work led to the Chomsky
hierarchy of type 0, 1, 2, 3 languages
and their associated grammarsWhy Compilers? A Brief History.
Formal Languages...
O The type 2 (context-free) grammars
turned out to be very good at
describing computer languages
O And, efficient ways to recognize the
structure of a source program using a
type 2 were developed
O Such recognition is called parsingWhy Compilers? A Brief History.
Formal Languages...
O Very closely related to context-free
grammars are the type 3 grammars
O These are equivalent to finite
automata and regular grammars
O An entire sub-branch of mathematics
studies automata; it’s called
automata theoryWhy Compilers? A Brief History.
Formal Languages...
O It turns out that type 3 (regular)
grammars are very good at describing
the “atoms” used in computer
languages
O These “atoms” are the reserved
words, symbols, and user-defined
words that are used in a computer
language
0) Recognizing atoms is called scanning
(or lexing)Why Compilers? A Brief
History...
O By far the most difficult and
complicated problem has been how
to generate object code that is
concise, and most importantly,
executes efficiently
O This is called “optimization”Why Compilers? A Brief
History...
O Far simpler are the front-end issues
of scanning and parsing = recognizing
the source code
O This is due to the fact that we’ve
developed (semi-) automatic ways to
create scanners and parsers...
O using scanner generators and parser
generatorsPrograms Related to
Compilers...
UO interpreters - directly executes
the code upon recognition;
usually statement by statement
O Assemblers — translate
assembly language to machine
language
O Macro Assemblers — ditto, but
with (powerful) macro
capabilitiesPrograms Related to
Compilers...
O Linkers - combine object
modules to produce an
executable module
O Linkage Editors - manage the
linking process, and are able to
create/maintain object librariesPrograms Related to
Compilers...
O Loaders —- load executable
modules into memory, and
launch execution
O Dynamic Loaders - loaders that
stay around during execution to
handle the loading of DLLs
(dynamically loadable libraries)Programs Related to
Compilers...
O Preprocessors — usually a
separate program whose input is
source code and whose output is
source code; perform macro
expansion, comment deletion,
etc. Sometimes the first phase
of a compilerPrograms Related to
Compilers...
O Editors - allow the user to create and
update source code
O Smart Editors - include syntax
coloring, parenthesis balancing, etc.
O Debuggers —- a program that provides
an environment in which code may be
debugged; including single stepping,
symbol tables, etc.Programs Related to
Compilers...
O IDEs - integrated development
environments; provide integrated
editor-debugger-execution
environments
O Profilers - collects statistics about
where programs spend their time
during execution; important for
optimizing at the source code levelPrograms Related to
Compilers...
O Project Managers — programs that
help software managers deal with
hundreds or thousands of modules;
build reports, etc.
O SCCS - source code control systems;
provide for multiple access to shared
code in a control mannerThe Translation Process
O The translation process consists _ of
a collection of phases, with the output
of one phase feeding the input of the
next
O The original source code is
transformed into a sequence of
intermediate representations (IRs)
during this processas The
vi Translation
a ProcessPhases of Compiler
Parallel to all other phases are two
activities:
O Symbol table manipulation. Symbol
table is one of the primary data-
structures that a compiler uses. This
data-structure is used by all of the
phases.
OError detecting and handlingThe Scanner
O The scanner reads the source
program, as a stream of characters,
and it performs |exical analysis -
collecting sequences of characters
into meaningful units called tokens
O The scanner also may create a
symbol table and _a literal tableThe Parser
O The parser reads the tokens produced
by the scanner and performs
syntactic analysis — creating an IR (a
parse tree or a syntax tree) showing
the structure of the program
O Syntax trees (abstract syntax trees)
are reduced representations of the
tree, with many irrelevant nodes
eliminatedThe Semantic Analyzer
O The semantics of a program are its
“meaning” — what it is intended to
accomplish
O The semantic analyzer creates an
intermediate data structure that
contains this meaning — these are the
static semantics
O The dynamic semantics of a program
only can be determined by
executing the programThe Semantic Analyzer..
O An example of the static semantics of
a program is the data types of the
variables (and expressions)
O These static semantics usually are
represented in the intermediate
representations (IRs) as attributes
O The IR usually is a tree, “decorated”
with these attributes(Source) Code Optimization
O Optimization may occur during
several phases
O Source code optimization rearranges
the source (or the IR of the source) in
order to produce more optimal results
OE,g., can become
O This is called constant folding(Source) Code Optimization...
O Duplicated computations can be
saved as temporaries and then
their values re-used
O Recursion can be converted to
iteration
O Repeated calculations can be moved
out of loops
O The possibilities are endless...The Code Generator
O The code generator takes the IR and
generates code for the target
machine
O Here the details of how various
numeric and non-numeric quantities
are represented become important
O E.g., word length, hardware stack,
hardware calling conventions,
memory access, etc.The Target Code Optimizer
O The target code optimizer examines
the emitted target code to see if
further possibilities for optimization
are present and then capitalizes upon
them
O E.g., reuse of registers, using a_ shift
instruction to replace a multiplication
or division, etc.Phases of the compiler
Source Program
Scanner
Parser
Lexical Analyzer
| Tokens
Syntax Analyzer
| Parse Tree
Semantic Analyzer
Abstract Syntax Tree with
attributesSample Program Compiled
OConsider the example:
int a, b{
a = 100;
=f (a) + 3}
Source Program
Lexical Analyzer
Token streamSample Program Compiled
eTokens are entities defined by the compiler writer
which are of interest. A sequence of characters with
collective meanings are grouped to form a token.
e Examples of Tokens:
OSingle Character operator: = + - * ><
OMore than one character operator: ++, --,==,<=
Q Numeric Constants: 1997 45.89 19.9e+7
Q Key Words: int, while, for
O Identifiers: x, my_name, Your_Name, a
e Homework: Identify all token types in C programs.Example Program Compiled-Continued
What are the tokens in the example?
ss
# | Token type # | Toke type
n
1. int keyword 8. = equal
Ds a identifier 9. | 100 integer
3. , comma 10 ; Semicolon
4. b identifier 11 b identifier
5s < L parenthesis | 12 f identifier
6. a identifier 13 =p Plus
f- ( L parenthesis | 14 3 integerExample Continued
The parser produces a parse tree: it is a heterogeneous
tree (nodes have different data types)
toot_node___
stmt] stmt2
stmt] =
a ‘100 —
f 3
HAN
aIntermediate-Code Generation
QUsing temporary location to save
values
Oti = 100
Ostore ti, a
Oload a, t2
Ot3 = f(t2)
Or =t3 +3
Ostore t4, bIntermediate-Code Optimization
OEliminate unnecessary code or
statements that want be executed
Ot1 = 100
Ostore ti, a
Ot3 = foo(t1)
Ot4 =t3 +3
Ostore t4, bTarget-code Generation
OMachine code generated for some
machine
ORi = 100
Ostore r1, 0x10
Ojsr _f
Or2 = r0+3
Ostore r2, 0x16Compiler Architecture
Single pass vs. multi pass architecture
Single pass: all passes interleaved, driven by
Parser
Sem.Analyser
> Scanner
A—>B AusesB
aad
Code Generator!
1 4/4
Symbol Table
data flow
oSMulti pass
Each pass finishes before next starts
OSaves main memory, communicate
through files
O Used if the language is complex or
portability is important
> Scanner> Parser ... > Code gen > *
Ww w
source file object fileFront end & Back end
OFront end: is the phases or parts of phases
that depend on the source language.
O Back end: is phases or part of phases that
depend on the target machine.
language dependent machine dependent
Java Pentium
Cn
Pascal PowerPC
any combination possible
Python Programming: 8 Simple Steps to Learn Python Programming Language in 24 hours! Practical Python Programming for Beginners, Python Commands and Python Language