Compiler Design: 20% For Each Hourly Exam 15% For The Home Work 50% For The Final Exam

Compiler design
 Text Book: Compilers: principles, theory, and techniques by

Aho, Sethi, and Ullman.
 Topics:
1. Compiler phases
2. Lexical analysis
3. Syntax analysis
4. Code generation
 Home work: there will be two major programming
assignments. They must be done independently.
 Examinations: there will be two hourly exams and a final
exam.
 Grading: the total grade will be computed as follows:
 20% for each hourly exam
 15% for the home work
 50% for the final exam.
Overview of Compiler
 Compiler is a program (written in a

high-level language) that converts /
translates / compiles source program
written in a high level language into
an equivalent machine code.
compiler
source program machine code
or object code
What is a Compiler?
 Definition: A compiler is a program
that translates one language to
another
 Usually, the translation takes place
between a high-level language and
a low-level language
 Clearly, our first step is to discuss
some terminology…
Terminology
 Source language – the language that
is being translated
 Object language – the language into
which the translation is being done
 High-level language – a language that
is far removed from a computer; one
which is close to the problem area(s)
for which the language is designed
Terminology…
 Low-level language – a language that
is close to the machine (computer)
upon which the language will run
(execute)
 Object language – (sometimes called
machine code) the language of some
computer. This language usually is
not human readable (and is expressed
in bits or hex)
Terminology…
 Intermediate language – a language that
is used either:
 because it is a temporary step in the
translation process; or,
 because it is neither particularly, high, nor
low, and is the output of a translation
 Assembly language – a language that
translates almost one-to-one to machine
language, but is in human readable form
What’s a Compiler?...
 Today, compilers are written using high-
level languages (such as Java, C++, etc.)
 The earliest compilers were written using
assembly language (e.g., FORTRAN and
COBOL around 1954)
 Sometimes a compiler is written in the
same language for which one is writing a
compiler. This is done through
Bootstrapping.
Why Should I learn Compiler
Construction?
 How do compilers work?
 How do computers work? (instruction set,
registers, addressing modes, run time data
structures, …)
 What machine code is generated for certain
language constructs? (efficiency
considerations)
 Getting "a feeling" for good language design
Why Compilers? A Brief History
 The first computers were “hard-
wired”
 That is, they were collections of
physical devices that connected to
one-another, in an assemblage
designed to calculate particular kinds
of results
Why Compilers? A Brief
History…
 For example, Babbage’s Analytic
Engine and his Difference Engine
were assemblages of gears that
solved numeric problems
 The primary driving force was the
calculation of ballistics tables for
artillery
 Jacquard’s loom is another example
 And Holleriths’ work for the US
Census bureau is another
History…
 In the late 1940’s John von Neumann
“invented” the stored program
computer
 The “invention” is the observation that
just as you can store data in the
memory of a computer, the data can
be machine instructions
 Then the computer can not only take
its instructions from memory…
History…
 But the computer can modify the
instructions in its memory…
 And, in fact, can write its own
programs, storing them in memory
 It quickly became apparent that the
simplest way to store information in a
computer was in the form of binary
numbers
History…
 So, to program a computer, you only
needed to enter a sequence of binary
numbers into memory, and then tell
the computer at which memory
address to start execution
 This was programming in machine
language
 Instructions (and data) were entered
from a console, one word (in binary) at
a time…
History…
 This form of coding (note the word!)
quickly was replaced by programming
in assembly language
 A program was written (in machine
language) which translated assembly
language to machine language (called
an assembler)
History…
 After the first assembler was written,
no one needed to code in machine
language any longer
 But, coding x = 3; can take many
instructions…
 So, the thought was – can we create a
program that translates something like
x = 3; into assembly language or into
machine language?
Why Compilers? A Brief History.
Formal Languages
 About the same time, in the mid-
1950’s, Noam Chomsky (M.I.T.)
began investigating the formal
structure of natural languages
 His work led to the Chomsky
hierarchy of type 0, 1, 2, 3 languages
and their associated grammars
Formal Languages…
 The type 2 (context-free) grammars
turned out to be very good at
describing computer languages
 And, efficient ways to recognize the
structure of a source program using a
type 2 were developed
 Such recognition is called parsing
Formal Languages…
 Very closely related to context-free
grammars are the type 3 grammars
 These are equivalent to finite
automata and regular grammars
 An entire sub-branch of mathematics
studies automata; it’s called
automata theory
Formal Languages…
 It turns out that type 3 (regular)
grammars are very good at describing
the “atoms” used in computer
languages
 These “atoms” are the reserved words,
symbols, and user-defined words that
are used in a computer language
 Recognizing atoms is called scanning
(or lexing)
History…
 By far the most difficult and
complicated problem has been how
to generate object code that is
concise, and most importantly,
executes efficiently
 This is called “optimization”
History…
 Far simpler are the front-end issues
of scanning and parsing = recognizing
the source code
 This is due to the fact that we’ve
developed (semi-) automatic ways to
create scanners and parsers…
 using scanner generators and parser
generators
Programs Related to
Compilers…
 Interpreters – directly executes
the code upon recognition;
usually statement by statement
 Assemblers – translate
assembly language to machine
language
 Macro Assemblers – ditto, but
with (powerful) macro
capabilities
Programs Related to
Compilers…
 Linkers – combine object
modules to produce an
executable module
 Linkage Editors – manage the
linking process, and are able to
create/maintain object libraries
Programs Related to
Compilers…
 Loaders – load executable
modules into memory, and
launch execution
 Dynamic Loaders – loaders that
stay around during execution to
handle the loading of DLLs
(dynamically loadable libraries)
Programs Related to
Compilers…
 Preprocessors – usually a
separate program whose input is
source code and whose output is
source code; perform macro
expansion, comment deletion,
etc. Sometimes the first phase
of a compiler
Programs Related to
Compilers…
 Editors – allow the user to create and
update source code
 Smart Editors – include syntax
coloring, parenthesis balancing, etc.
 Debuggers – a program that provides
an environment in which code may be
debugged; including single stepping,
symbol tables, etc.
Programs Related to
Compilers…
 IDEs – integrated development
environments; provide integrated
editor-debugger-execution
environments
 Profilers – collects statistics about
where programs spend their time
during execution; important for
optimizing at the source code level
Programs Related to
Compilers…
 Project Managers – programs that
help software managers deal with
hundreds or thousands of modules;
build reports, etc.
 SCCS – source code control systems;
provide for multiple access to shared
code in a control manner
The Translation Process
 The translation process consists of
a collection of phases, with the output
of one phase feeding the input of the
next
 The original source code is
transformed into a sequence of
intermediate representations (IRs)
during this process
The
… Translation
Process
Phases of Compiler
Parallel to all other phases are two
activities:
 Symbol table manipulation. Symbol
table is one of the primary data-
structures that a compiler uses. This
data-structure is used by all of the
phases.
 Error detecting and handling
The Scanner
 The scanner reads the source
program, as a stream of characters,
and it performs lexical analysis –
collecting sequences of characters
into meaningful units called tokens
 The scanner also may create a
symbol table and a literal table
The Parser
 The parser reads the tokens produced
by the scanner and performs
syntactic analysis – creating an IR (a
parse tree or a syntax tree) showing
the structure of the program
 Syntax trees (abstract syntax trees)
are reduced representations of the
tree, with many irrelevant nodes
eliminated
The Semantic Analyzer
 The semantics of a program are its
“meaning” – what it is intended to
accomplish
 The semantic analyzer creates an
intermediate data structure that
contains this meaning – these are the
static semantics
 The dynamic semantics of a program
only can be determined by executing
the program
The Semantic Analyzer…
 An example of the static semantics of
a program is the data types of the
variables (and expressions)
 These static semantics usually are
represented in the intermediate
representations (IRs) as attributes
 The IR usually is a tree, “decorated”
with these attributes
(Source) Code Optimization
 Optimization may occur during
several phases
 Source code optimization rearranges
the source (or the IR of the source) in
order to produce more optimal results
 E.g., x = 7 + 9; can become x =
16;
 This is called constant folding
(Source) Code Optimization…
 Duplicated computations can be
saved as temporaries and then
their values re-used
 Recursion can be converted to
iteration
 Repeated calculations can be moved
out of loops
 The possibilities are endless…
The Code Generator
 The code generator takes the IR and
generates code for the target machine
 Here the details of how various
numeric and non-numeric quantities
are represented become important
 E.g., word length, hardware stack,
hardware calling conventions, memory
access, etc.
The Target Code Optimizer
 The target code optimizer examines
the emitted target code to see if
further possibilities for optimization
are present and then capitalizes upon
them
 E.g., reuse of registers, using a shift
instruction to replace a multiplication
or division, etc.
Phases of the compiler
Source Program
Scanner Lexical Analyzer

Tokens
Parser Syntax Analyzer
Parse Tree
Semantic Analyzer
Abstract Syntax Tree with
attributes
Sample Program Compiled
 Consider the example:
int a, b{
a = 100;
b = f (a) + 3}
Source Program
Lexical Analyzer
Token stream
Sample Program Compiled
Tokens are entities defined by the compiler writer
which are of interest. A sequence of characters
with collective meanings are grouped to form a
token.
 Examples of Tokens:
Single Character operator: = + - * > <
More than one character operator: ++, --,==,<=
 Numeric Constants: 1997 45.89 19.9e+7
 Key Words: int, while, for
 Identifiers: x, my_name, Your_Name, a
 Homework: Identify all token types in C programs.
Example Program Compiled-Continued
What are the tokens in the example?
# Token type # Toke type

n
1. int keyword 8. = equal
2. a identifier 9. 100 integer
3. , comma 10 ; Semicolon
4. b identifier 11 b identifier
5. { L parenthesis 12 f identifier
6. a identifier 13 + Plus
7 ( L parenthesis 14 3 integer
Example Continued
The parser produces a parse tree: it is a heterogeneous

tree (nodes have different data types)
root_node
stmt1 stmt2
stmt1 stmt2
=
=
a 100 b +
f 3
( a )
Intermediate-Code Generation
 Using temporary location to save

values
 t1 = 100
 store t1, a
 load a, t2
 t3 = f(t2)
 t4 = t3 + 3
 store t4, b
Intermediate-Code Optimization
 Eliminate unnecessary code or
statements that want be executed
 t1 = 100
 store t1, a
 t3 = foo(t1)
 t4 = t3 + 3
 store t4, b
Target-code Generation
 Machine code generated for some
machine
 R1 = 100
 store r1, 0x10
 jsr _f
 r2 = r0 + 3
 store r2, 0x16
Compiler Architecture
Single pass vs. multi pass architecture
Single pass: all passes interleaved, driven by parser

Multi pass
Each pass finishes before next starts
Saves main memory, communicate through
files
 Used if the language is complex or portability
is important
Front end & Back end
 Front end: is the phases or parts of phases
that depend on the source language.
 Back end: is phases or part of phases that
depend on the target machine.

Compiler Design: 20% For Each Hourly Exam 15% For The Home Work 50% For The Final Exam

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compiler Design: 20% For Each Hourly Exam 15% For The Home Work 50% For The Final Exam

Uploaded by

Copyright:

Available Formats

Compiler design

 Text Book: Compilers: principles, theory, and techniques by

 Compiler is a program (written in a

Scanner Lexical Analyzer

# Token type # Toke type

The parser produces a parse tree: it is a heterogeneous

 Using temporary location to save

Single pass: all passes interleaved, driven by parser

You might also like