You are on page 1of 36

CSE4310 – Compiler Design

CH1 - Introduction
Outline
• Introduction – Definition
• Why you study about compilers?
• Classification of Compilers
• Cousins of Compilers
– Language Processing System
– Compilers vs Interpreters
• Basic Compiler Design
– The Analysis – Synthesis Model of Compilation
• Phases of a Compiler
• Compiler Construction Tools
Introduction - Definition
• Compiler is an executable program that can read a program in one high-
level language and translate it into an equivalent executable program in
machine language.
• A compiler is a computer program that translates an executable program
in a source language into an equivalent program in a target language.

– A source program/code is a program/code written in the source language,


which is usually a high-level language.
– A target program/code is a program/code written in the target language, which
often is a machine language or an intermediate code (Object Code).
Introduction - Definition
• As a discipline compiler design involves multiple
Computer Science and Engineering courses like:
o Programming Languages
o Data Structures and Algorithms
o Theory of Computation (Automata and Formal Language
Theory)
o Assembly Language
o Software Engineering
o Computer Architecture
o Operating Systems and
o Discrete Mathematics
Why you study about Compilers?
• Curiosity
• Prerequisite for developing advanced compilers, which continues to be
active as new computer architectures emerge
• To improve capabilities of existing compiler/interpreter
• To write more efficient code in a high-level language
• Useful to develop software tools that parse computer codes or strings
– E.g., editors, debuggers, interpreters, preprocessors, …
• Important to understand how compliers work to program more effectively
• To provide solid foundation in parsing theory for parser writing
• To make compiler design as an excellent “capstone” project
• To apply almost all of the major computer science fields, such as:
– automata theory, computer programming, programming language design theory,
assembly language, computer architecture, data structures, algorithms, and
software engineering.
Why you study about Compilers?
• Compilers are everywhere, many applications for compiler
technology can include:
– Parsers for HTML in web browser
– Interpreters for javascript/flash
– Machine code generation for high level languages
– Software testing
– Program optimization
– Malicious code detection
– Design of new computer architectures
– Compiler-in-the-loop hardware development
• Hardware synthesis: VHDL to RTL translation
– Compiled simulation
• Used to simulate designs written in VHDL
Why you study about Compilers?
• The nature of compiler algorithms draw results from mathematical logic,
lattice theory, linear algebra, probability, etc.
– type checking, static analysis, dependence analysis and loop parallelization,
cache analysis, etc.
• The compiler algorithms make practical application of
o Greedy algorithms – register allocation
o Heuristic search – list scheduling
o Graph algorithms – dead code elimination, register allocation
o Dynamic programming – instruction selection
o Optimization techniques – instruction scheduling
o Finite automata – lexical analysis
o Pushdown automata – parsing
o Fixed point algorithms – data-flow analysis
o Complex data structures – symbol tables, parse trees, data dependence graphs
o Computer architecture – machine code generation
Classifications of Compilers
• Compilers are viewed from many perspectives:
– Classifying compilers by number of passes has its background in the hardware
resource limitations of computers.

• Compiling involves performing lots of work and early


computers did not have enough memory to contain
one program that did all of this work.

• So compilers were split up into smaller programs


which each made a pass over the source (or some
representation of it) performing some of the required
analysis and translations.

• However, all utilize same basic tasks to accomplish their actions


Classifications of Compilers
1. Single(One) Pass Compilers:- is a compiler that passes through the
source code of each compilation unit only once
– Also called narrow compilers.
– The ability to compile in a single pass has classically been seen as a benefit because
it simplifies the job of writing a compiler.
– Single-pass compilers generally perform compilations faster than multi-pass
compilers.
– Due to the resource limitations of early systems, many early languages were
specifically designed so that they could be compiled in a single pass (e.g., Pascal).
• Disadvantage of single pass Compilers:
– It is not possible to perform many of the sophisticated optimizations needed to
generate high quality code.
– It can be difficult to count exactly how many passes an optimizing compiler makes.
– For instance, different phases of optimization may analyze one expression many
times but only analyze another expression once.
Classifications of Compilers
2. Multi-Pass Compilers:- is a type of compiler that
processes the source code or abstract syntax tree of a
program several times.
– Also called wide compilers.
– Phases are separate "Programs", which run sequentially
– Here, by splitting the compiler up into small programs, correct
programs will be produced.
– Proving the correctness of a set of small programs often requires
less effort than proving the correctness of a larger, single,
equivalent program.
– Many programming languages cannot be represented with a
single pass compilers, for example most latest languages like Java
require a multi-pass compiler.
Classifications of Compilers
3. Load and Go Compilers:- generates machine code & then
immediately executes it.
– Compilers usually produce either absolute code that is executed
immediately upon conclusion of the compilation or object code
that is transformed by a linking loader into absolute code.
– These compiler organizations will be called Load & Go and
Link/Load.
– Both Load & Go and Link/Load compilers use a number of passes
to translate the source program into absolute code.
– A pass reads some form of the source program, transforms it
into an another form, and normally outputs this form to an
intermediate file which may be the input of a later pass.
Classifications of Compilers
4. Optimizing Compilers:- is a compiler that tries to minimize or
maximize some attributes of an executable computer program.
– The most common requirement is to minimize the time taken to execute
a program; a less common one is to minimize the amount of memory
occupied.
– The growth of portable computers has created a market for minimizing
the power consumed by a program.
– Compiler optimization is generally implemented using a sequence of
optimizing transformations, algorithms which take a program and
transform it to produce a semantically equivalent output program that
uses fewer resources.
– Types of Optimization includes Peephole optimizations, local
optimization, global optimization, loop optimization, machine-code
optimization, etc. [Read more ]
Cousins of Compilers
Language Processing System
Cousins of Compilers
A. Assembler:- is a translator that converts programs written in assembly language into
machine code.
– Translate mnemonic operation codes to their machine language equivalents.
– Assigning machine addresses to symbolic labels.

B. Interpreter:- is a computer program that translates high level instructions/programs


into machine code as they are encountered.
– It produces output of statement as they are interpreted
– It generally uses one of the following strategies for program execution:
– Execute the source code directly
– Translate source code into some efficient intermediate representation and immediately executes it
– Explicitly execute stored precompiled code made by a compiler which is part of the interpreter
system
Cousins of Compilers

C. Linker:- is a program that takes one or more objects generated by a


compiler and combines them into a single executable program.
D. Loader:- is the part of an operating system that is responsible for loading
programs from executables (i.e., executable files) into memory, preparing
them for execution and then executing them.
Cousins of Compilers
Compiler vs. Interpreter

• Most languages are usually thought of as using either one or the other:
– Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1
– Interpreters: Lisp, scheme, BASIC, APL, Perl, Python, Smalltalk
• BUT: not always implemented this way
Basic Compiler Design
• Involves writing a huge program that takes as input another
program in the source language for the compiler, and gives
as output an executable that we can run.
– For modifying code easily, usually, we use modular design
(decomposition) methodology to design a compiler.
• There are two design strategies:
1. Write a “front end” of the compiler (i.e. the lexer, parser,
semantic analyzer, and assembly tree generator), and write a
separate back end for each platform that you want to support
2. Write an efficient highly optimized “back end”, and write a
different front end for several languages, such as Fortran, C, C+
+, and Java.
Basic Compiler Design
The Analysis-Synthesis Model of Compilation
• There are two parts to compilation: analysis & synthesis.
– During analysis, the operations implied by the source program are
determined and recorded in a hierarchical structure called a tree.
– During synthesis, the operations involved in producing translated
code.  Breaks up source program into constituent
pieces
 Creates intermediate representation of
source program

 Construct target program from


intermediate representation
 Takes the tree structure and translates the
operations into the target program
Basic Compiler Design
Analysis
• In compiling, analysis has three phases:
1. Linear analysis: stream of characters read from left-to-right and grouped into
tokens; known as lexical analysis or scanning
• Converting input text into stream of known objects called tokens.
• It simplifies parsing process
2. Hierarchical analysis: tokens grouped hierarchically with collective meaning;
known as parsing or syntax analysis
• Translating code to rules of grammar.
• Building representation of code.
3. Semantic analysis: check if the program components fit together meaningfully
• Checks source program for semantic errors
• Gathers type information for subsequent code generation (type checking)
• Identifies operator and operands of expressions and statements
Basic Compiler Design
Synthesis
• The synthesis of the target program is
composed of two phases, which are:
– Code Optimization, and
– Code Generation.
• The synthesis part of the compilation process is
also called the back-end of a compiler.
Phases of a Compiler

Phase I: Lexical Analysis (Scanning)


Phase II: Syntax Analysis (Parsing)
Phase III: Semantic Analysis
Phase IV: Intermediate Code Generation
Phase V: Assembly Code Generation
Phase VI: Machine Code Generation and Linking
Phases of a Compiler
Phase I: Lexical Analysis (Scanning)
• This is the low-level text processing portion of the compiler
• The source file, a stream of characters, is broken into larger chunks called token.
– For example:
void main()
{
int x;
x=3;
}
• The lexical analyzer (scanner) reads a stream of characters and puts them together
into some meaningful (with respect to the source language) units called tokens.
– Typically, spaces, tabs, end-of-line characters and comments are ignored by the lexical
analyzer.
• To design a lexical analyzer:
– Input a description (regular expressions) of the tokens in the language, and output a lexical
analyzer (a program).
Phases of a Compiler
Phase II: Syntax Analysis (Parsing)
• A parser gets a stream of tokens from the scanner, and determines if the syntax
(structure) of the program is correct according to the (context-free) grammar of the
source language.
• Then, it produces a data structure, called a parse tree or an abstract syntax tree,
which describes the syntactic structure of the program.
• The parser
– Ensures that the sequence of tokens returned by the lexical analyzer forms a syntactically
correct program
– It also builds a structured representation of the program called an abstract syntax tree that is
easier for the type checker to analyze than a stream of tokens
– It catches the syntax errors as the statement below:
if if (x > 3) then x = x + 1
• Context-free grammars will be used (as the input) by the parser generator to describe
the syntax of the compiling language
• Most compilers do not generate a parse tree explicitly but rather go to intermediate
code directly as syntax analysis takes place.
Phases of a Compiler
Phase II: Syntax Analysis (Parsing)
Parse Tree
• Is output of parsing that shows the Top-down description of program syntax
• Root node is entire program and leaves are tokens that were identified during lexical analysis
• Constructed by repeated application of rules in Context Free Grammar (CFG)
• Syntax structures are analyzed by DPDA (Deterministic Push Down Automata)
• Example: parse tree for position := initial + rate*60
Phases of a Compiler
Phase III: Semantic Analysis
• It gets the parse tree from the parser together with information about
some syntactic elements and determines if the semantics (meanings) of
the program is correct
– It detects errors of the program, such as using variables before they are
declared, assign an integer value to a Boolean variable, …
• This part deals with static semantic:
– semantic of programs that can be checked by reading off from the program only.
– syntax of the language which cannot be described in context-free grammar.
• Mostly, a semantic analyzer does type checking (i.e. gathers type
information for subsequent code generation.)
• It modifies the parse tree in order to get that (static) semantically correct
code
– In this phase, the abstract syntax tree that is produced by the parser is traversed,
looking for semantic errors.
Phases of a Compiler
Phase III: Semantic Analysis
• The main tool used by the semantic analyzer is a symbol table
• Symbol table:- is a data structure with a record for each identifier and its
attributes
– Attributes include storage allocation, type, scope, etc
– All the compiler phases insert and modify the symbol table
• Discovery of meaning in a program using the symbol table involves:
– Do static semantics check
– Simplify the structure of the parse tree ( from parse tree to abstract syntax tree (AST) )
• Static semantics check
– Making sure identifiers are declared before use
– Type checking for assignments and operators
– Checking types and number of parameters to subroutines
– Making sure functions contain return statements
– Making sure there are no repeats among switch statement labels
Phases of a Compiler
Phase IV: Intermediate Code Generation
• An intermediate code generator takes a parse tree from the semantic
analyzer generates a program in the intermediate language.
• In some compilers, a source program is translated into an intermediate
code first and then the intermediate code is translated into the target
language.
– In other compilers, a source program is translated directly into the target
language.

• Compiler makes a second pass over the parse tree to produce the
translated code
– If there are no compile-time errors, the semantic analyzer translates the
abstract syntax tree into the abstract assembly tree
– The abstract assembly tree will be passed to the code optimization and
assembly code generation phase
Phases of a Compiler
Phase IV: Intermediate Code Generation
• Using intermediate code is beneficial when compilers
which translates a single source language to many
target languages are required.
– The front-end of a compiler:- scanner to intermediate code
generator can be used for every compilers.
– Different back-ends:- code optimizer and code generator is
required for each target language.
• One of the popular intermediate code is three-address
code.
• A three-address code instruction is in the form of
x = y op z
Phases of a Compiler
Phase V: Assembly Code Generation
• Code generator coverts the abstract assembly tree into the
actual assembly code
• To do code generation
– The generator covers the abstract assembly tree with tiles (each
tile represents a small portion of an abstract assembly tree) and
• Output the actual assembly code associated with the tiles
that we used to cover the tree
Phase VI: Machine Code Generation and Linking
• The final phase of compilation coverts the assembly
code into machine code and links (by a linker) in
appropriate language libraries
Code Optimization
• Is a process of replacing an inefficient sequence of instructions with a
better sequence of instructions.
• Sometimes called code improvement.
• Code optimization can be done:
– after semantic analyzing
performed on a parse tree
– after intermediate code generation
performed on a intermediate code
– after code generation
performed on a target code
• There are two types of code optimizations
1. Local Optimization
2. Global Optimization
Code Optimization
1. Local Optimization
– The compiler looks at a very small block of instructions and tries to determine
how it can improve the efficiency of this local code block.
– Relatively easy; included as part of most compilers
– Examples of possible local optimizations
1. Constant evaluation
2. Strength reduction
3. Eliminating unnecessary operations

2. Global Optimization
– The compiler looks at large segments of the program to decide how to
improve performance
– Much more difficult; usually omitted from all but the most sophisticated and
expensive production-level “optimizing compilers”
– Optimization cannot make an inefficient algorithm efficient
Summary of Phases of a Compiler
Phase Output Sample
Programmer (source code Source string A=B+C;
producer)
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
;
Parser (performs syntax analysis Parse tree or abstract |
based on the grammar of the syntax tree =
/\
programming language) A +
/\
B C

Semantic analyzer (type Annotated parse tree or


checking, ...) abstract syntax tree
Intermediate code generator Three-address code, int2fp B t1
quads, or RTL + t1 C t2
:= t2 A
Optimizer Three-address code, int2fp B t1
quads, or RTL + t1 #2.3 A
Code generator Assembly code MOVF #2.3, r1
ADDF2 r1, r2
MOVF r2, A
Summary of Phases of a Compiler
Compiler Construction Tools
• Various tools are used in the construction of the
various parts of a compiler.
– Using these tools, it is now relatively easier to construct a
compiler.
• The following are software development tools
available to implement one or more compiler phases
 Scanner generators
 Parser generators
 Syntax-directed translation engines
 Automatic code generators
 Data Flow Engines
Compiler Construction Tools
• Scanner generators for C/C++: Flex, Lex.
• Parser generators for C/C++: Bison, YACC.
• Available scanner generators for Java:
– JLex, a scanner generator for Java, very similar to Lex.
– JFlex, flex for Java.
• Available parser generators for Java:
– CUP, a parser generator for Java, very similar to YACC.
– BYACC/J, a different version of Berkeley YACC for Java. It is an extension of the
standard YACC (a -j flag has been added to generate Java code).
• Other compiler tools:
– JavaCC, a parser generator for Java, including scanner generator and parser
generator. Input specifications are different than those suitable for Lex/YACC.
Also, unlike YACC, JavaCC generates a top-down parser.
– ANTLR, a set of language translation tools (formerly PCCTS). Includes
scanner/parser generators for C, C++, and Java.
End of CH1!
• Thank You!!!

You might also like