CD - CH1 - Introduction

CSE4310 – Compiler Design
CH1 - Introduction
Outline
• Introduction – Definition
• Why you study about compilers?
• Classification of Compilers
• Cousins of Compilers
– Language Processing System
– Compilers vs Interpreters
• Basic Compiler Design
– The Analysis – Synthesis Model of Compilation
• Phases of a Compiler
• Compiler Construction Tools
Introduction - Definition
• Compiler is an executable program that can read a program in one high-
level language and translate it into an equivalent executable program in
machine language.
• A compiler is a computer program that translates an executable program
in a source language into an equivalent program in a target language.
– A source program/code is a program/code written in the source language,

which is usually a high-level language.
– A target program/code is a program/code written in the target language, which
often is a machine language or an intermediate code (Object Code).
Introduction - Definition
• As a discipline compiler design involves multiple
Computer Science and Engineering courses like:
o Programming Languages
o Data Structures and Algorithms
o Theory of Computation (Automata and Formal Language
Theory)
o Assembly Language
o Software Engineering
o Computer Architecture
o Operating Systems and
o Discrete Mathematics
Why you study about Compilers?
• Curiosity
• Prerequisite for developing advanced compilers, which continues to be
active as new computer architectures emerge
• To improve capabilities of existing compiler/interpreter
• To write more efficient code in a high-level language
• Useful to develop software tools that parse computer codes or strings
– E.g., editors, debuggers, interpreters, preprocessors, …
• Important to understand how compliers work to program more effectively
• To provide solid foundation in parsing theory for parser writing
• To make compiler design as an excellent “capstone” project
• To apply almost all of the major computer science fields, such as:
– automata theory, computer programming, programming language design theory,
assembly language, computer architecture, data structures, algorithms, and
software engineering.
• Compilers are everywhere, many applications for compiler
technology can include:
– Parsers for HTML in web browser
– Interpreters for javascript/flash
– Machine code generation for high level languages
– Software testing
– Program optimization
– Malicious code detection
– Design of new computer architectures
– Compiler-in-the-loop hardware development
• Hardware synthesis: VHDL to RTL translation
– Compiled simulation
• Used to simulate designs written in VHDL
• The nature of compiler algorithms draw results from mathematical logic,
lattice theory, linear algebra, probability, etc.
– type checking, static analysis, dependence analysis and loop parallelization,
cache analysis, etc.
• The compiler algorithms make practical application of
o Greedy algorithms – register allocation
o Heuristic search – list scheduling
o Graph algorithms – dead code elimination, register allocation
o Dynamic programming – instruction selection
o Optimization techniques – instruction scheduling
o Finite automata – lexical analysis
o Pushdown automata – parsing
o Fixed point algorithms – data-flow analysis
o Complex data structures – symbol tables, parse trees, data dependence graphs
o Computer architecture – machine code generation
Classifications of Compilers
• Compilers are viewed from many perspectives:
– Classifying compilers by number of passes has its background in the hardware
resource limitations of computers.
• Compiling involves performing lots of work and early

computers did not have enough memory to contain
one program that did all of this work.
• So compilers were split up into smaller programs

which each made a pass over the source (or some
representation of it) performing some of the required
analysis and translations.
• However, all utilize same basic tasks to accomplish their actions

1. Single(One) Pass Compilers:- is a compiler that passes through the
source code of each compilation unit only once
– Also called narrow compilers.
– The ability to compile in a single pass has classically been seen as a benefit because
it simplifies the job of writing a compiler.
– Single-pass compilers generally perform compilations faster than multi-pass
compilers.
– Due to the resource limitations of early systems, many early languages were
specifically designed so that they could be compiled in a single pass (e.g., Pascal).
• Disadvantage of single pass Compilers:
– It is not possible to perform many of the sophisticated optimizations needed to
generate high quality code.
– It can be difficult to count exactly how many passes an optimizing compiler makes.
– For instance, different phases of optimization may analyze one expression many
times but only analyze another expression once.
2. Multi-Pass Compilers:- is a type of compiler that
processes the source code or abstract syntax tree of a
program several times.
– Also called wide compilers.
– Phases are separate "Programs", which run sequentially
– Here, by splitting the compiler up into small programs, correct
programs will be produced.
– Proving the correctness of a set of small programs often requires
less effort than proving the correctness of a larger, single,
equivalent program.
– Many programming languages cannot be represented with a
single pass compilers, for example most latest languages like Java
require a multi-pass compiler.
3. Load and Go Compilers:- generates machine code & then
immediately executes it.
– Compilers usually produce either absolute code that is executed
immediately upon conclusion of the compilation or object code
that is transformed by a linking loader into absolute code.
– These compiler organizations will be called Load & Go and
Link/Load.
– Both Load & Go and Link/Load compilers use a number of passes
to translate the source program into absolute code.
– A pass reads some form of the source program, transforms it
into an another form, and normally outputs this form to an
intermediate file which may be the input of a later pass.
4. Optimizing Compilers:- is a compiler that tries to minimize or
maximize some attributes of an executable computer program.
– The most common requirement is to minimize the time taken to execute
a program; a less common one is to minimize the amount of memory
occupied.
– The growth of portable computers has created a market for minimizing
the power consumed by a program.
– Compiler optimization is generally implemented using a sequence of
optimizing transformations, algorithms which take a program and
transform it to produce a semantically equivalent output program that
uses fewer resources.
– Types of Optimization includes Peephole optimizations, local
optimization, global optimization, loop optimization, machine-code
optimization, etc. [Read more ]
Cousins of Compilers
Language Processing System
A. Assembler:- is a translator that converts programs written in assembly language into
machine code.
– Translate mnemonic operation codes to their machine language equivalents.
– Assigning machine addresses to symbolic labels.
B. Interpreter:- is a computer program that translates high level instructions/programs

into machine code as they are encountered.
– It produces output of statement as they are interpreted
– It generally uses one of the following strategies for program execution:
– Execute the source code directly
– Translate source code into some efficient intermediate representation and immediately executes it
– Explicitly execute stored precompiled code made by a compiler which is part of the interpreter
system
C. Linker:- is a program that takes one or more objects generated by a

compiler and combines them into a single executable program.
D. Loader:- is the part of an operating system that is responsible for loading
programs from executables (i.e., executable files) into memory, preparing
them for execution and then executing them.
Compiler vs. Interpreter
• Most languages are usually thought of as using either one or the other:
– Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1
– Interpreters: Lisp, scheme, BASIC, APL, Perl, Python, Smalltalk
• BUT: not always implemented this way
Basic Compiler Design
• Involves writing a huge program that takes as input another
program in the source language for the compiler, and gives
as output an executable that we can run.
– For modifying code easily, usually, we use modular design
(decomposition) methodology to design a compiler.
• There are two design strategies:
1. Write a “front end” of the compiler (i.e. the lexer, parser,
semantic analyzer, and assembly tree generator), and write a
separate back end for each platform that you want to support
2. Write an efficient highly optimized “back end”, and write a
different front end for several languages, such as Fortran, C, C+
+, and Java.
The Analysis-Synthesis Model of Compilation
• There are two parts to compilation: analysis & synthesis.
– During analysis, the operations implied by the source program are
determined and recorded in a hierarchical structure called a tree.
– During synthesis, the operations involved in producing translated
code.  Breaks up source program into constituent
pieces
 Creates intermediate representation of
source program
 Construct target program from

intermediate representation
 Takes the tree structure and translates the
operations into the target program
Analysis
• In compiling, analysis has three phases:
1. Linear analysis: stream of characters read from left-to-right and grouped into
tokens; known as lexical analysis or scanning
• Converting input text into stream of known objects called tokens.
• It simplifies parsing process
2. Hierarchical analysis: tokens grouped hierarchically with collective meaning;
known as parsing or syntax analysis
• Translating code to rules of grammar.
• Building representation of code.
3. Semantic analysis: check if the program components fit together meaningfully
• Checks source program for semantic errors
• Gathers type information for subsequent code generation (type checking)
• Identifies operator and operands of expressions and statements
Synthesis
• The synthesis of the target program is
composed of two phases, which are:
– Code Optimization, and
– Code Generation.
• The synthesis part of the compilation process is
also called the back-end of a compiler.
Phases of a Compiler
Phase I: Lexical Analysis (Scanning)

Phase II: Syntax Analysis (Parsing)
Phase III: Semantic Analysis
Phase IV: Intermediate Code Generation
Phase V: Assembly Code Generation
Phase VI: Machine Code Generation and Linking
Phase I: Lexical Analysis (Scanning)
• This is the low-level text processing portion of the compiler
• The source file, a stream of characters, is broken into larger chunks called token.
– For example:
void main()
{
int x;
x=3;
}
• The lexical analyzer (scanner) reads a stream of characters and puts them together
into some meaningful (with respect to the source language) units called tokens.
– Typically, spaces, tabs, end-of-line characters and comments are ignored by the lexical
analyzer.
• To design a lexical analyzer:
– Input a description (regular expressions) of the tokens in the language, and output a lexical
analyzer (a program).
• A parser gets a stream of tokens from the scanner, and determines if the syntax
(structure) of the program is correct according to the (context-free) grammar of the
source language.
• Then, it produces a data structure, called a parse tree or an abstract syntax tree,
which describes the syntactic structure of the program.
• The parser
– Ensures that the sequence of tokens returned by the lexical analyzer forms a syntactically
correct program
– It also builds a structured representation of the program called an abstract syntax tree that is
easier for the type checker to analyze than a stream of tokens
– It catches the syntax errors as the statement below:
if if (x > 3) then x = x + 1
• Context-free grammars will be used (as the input) by the parser generator to describe
the syntax of the compiling language
• Most compilers do not generate a parse tree explicitly but rather go to intermediate
code directly as syntax analysis takes place.
Parse Tree
• Is output of parsing that shows the Top-down description of program syntax
• Root node is entire program and leaves are tokens that were identified during lexical analysis
• Constructed by repeated application of rules in Context Free Grammar (CFG)
• Syntax structures are analyzed by DPDA (Deterministic Push Down Automata)
• Example: parse tree for position := initial + rate*60
• It gets the parse tree from the parser together with information about
some syntactic elements and determines if the semantics (meanings) of
the program is correct
– It detects errors of the program, such as using variables before they are
declared, assign an integer value to a Boolean variable, …
• This part deals with static semantic:
– semantic of programs that can be checked by reading off from the program only.
– syntax of the language which cannot be described in context-free grammar.
• Mostly, a semantic analyzer does type checking (i.e. gathers type
information for subsequent code generation.)
• It modifies the parse tree in order to get that (static) semantically correct
code
– In this phase, the abstract syntax tree that is produced by the parser is traversed,
looking for semantic errors.
• The main tool used by the semantic analyzer is a symbol table
• Symbol table:- is a data structure with a record for each identifier and its
attributes
– Attributes include storage allocation, type, scope, etc
– All the compiler phases insert and modify the symbol table
• Discovery of meaning in a program using the symbol table involves:
– Do static semantics check
– Simplify the structure of the parse tree ( from parse tree to abstract syntax tree (AST) )
• Static semantics check
– Making sure identifiers are declared before use
– Type checking for assignments and operators
– Checking types and number of parameters to subroutines
– Making sure functions contain return statements
– Making sure there are no repeats among switch statement labels
• An intermediate code generator takes a parse tree from the semantic
analyzer generates a program in the intermediate language.
• In some compilers, a source program is translated into an intermediate
code first and then the intermediate code is translated into the target
language.
– In other compilers, a source program is translated directly into the target
language.
• Compiler makes a second pass over the parse tree to produce the
translated code
– If there are no compile-time errors, the semantic analyzer translates the
abstract syntax tree into the abstract assembly tree
– The abstract assembly tree will be passed to the code optimization and
assembly code generation phase
• Using intermediate code is beneficial when compilers
which translates a single source language to many
target languages are required.
– The front-end of a compiler:- scanner to intermediate code
generator can be used for every compilers.
– Different back-ends:- code optimizer and code generator is
required for each target language.
• One of the popular intermediate code is three-address
code.
• A three-address code instruction is in the form of
x = y op z
Phase V: Assembly Code Generation
• Code generator coverts the abstract assembly tree into the
actual assembly code
• To do code generation
– The generator covers the abstract assembly tree with tiles (each
tile represents a small portion of an abstract assembly tree) and
• Output the actual assembly code associated with the tiles
that we used to cover the tree
Phase VI: Machine Code Generation and Linking
• The final phase of compilation coverts the assembly
code into machine code and links (by a linker) in
appropriate language libraries
Code Optimization
• Is a process of replacing an inefficient sequence of instructions with a
better sequence of instructions.
• Sometimes called code improvement.
• Code optimization can be done:
– after semantic analyzing
performed on a parse tree
– after intermediate code generation
performed on a intermediate code
– after code generation
performed on a target code
• There are two types of code optimizations
1. Local Optimization
2. Global Optimization
Code Optimization
1. Local Optimization
– The compiler looks at a very small block of instructions and tries to determine
how it can improve the efficiency of this local code block.
– Relatively easy; included as part of most compilers
– Examples of possible local optimizations
1. Constant evaluation
2. Strength reduction
3. Eliminating unnecessary operations
2. Global Optimization
– The compiler looks at large segments of the program to decide how to
improve performance
– Much more difficult; usually omitted from all but the most sophisticated and
expensive production-level “optimizing compilers”
– Optimization cannot make an inefficient algorithm efficient
Summary of Phases of a Compiler
Phase Output Sample
Programmer (source code Source string A=B+C;
producer)
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names
;
Parser (performs syntax analysis Parse tree or abstract |
based on the grammar of the syntax tree =
/\
programming language) A +
/\
B C
Semantic analyzer (type Annotated parse tree or

checking, ...) abstract syntax tree
Intermediate code generator Three-address code, int2fp B t1
quads, or RTL + t1 C t2
:= t2 A
Optimizer Three-address code, int2fp B t1
quads, or RTL + t1 #2.3 A
Code generator Assembly code MOVF #2.3, r1
ADDF2 r1, r2
MOVF r2, A
Summary of Phases of a Compiler
Compiler Construction Tools
• Various tools are used in the construction of the
various parts of a compiler.
– Using these tools, it is now relatively easier to construct a
compiler.
• The following are software development tools
available to implement one or more compiler phases
 Scanner generators
 Parser generators
 Syntax-directed translation engines
 Automatic code generators
 Data Flow Engines
Compiler Construction Tools
• Scanner generators for C/C++: Flex, Lex.
• Parser generators for C/C++: Bison, YACC.
• Available scanner generators for Java:
– JLex, a scanner generator for Java, very similar to Lex.
– JFlex, flex for Java.
• Available parser generators for Java:
– CUP, a parser generator for Java, very similar to YACC.
– BYACC/J, a different version of Berkeley YACC for Java. It is an extension of the
standard YACC (a -j flag has been added to generate Java code).
• Other compiler tools:
– JavaCC, a parser generator for Java, including scanner generator and parser
generator. Input specifications are different than those suitable for Lex/YACC.
Also, unlike YACC, JavaCC generates a top-down parser.
– ANTLR, a set of language translation tools (formerly PCCTS). Includes
scanner/parser generators for C, C++, and Java.
End of CH1!
• Thank You!!!

CD - CH1 - Introduction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CD - CH1 - Introduction

Uploaded by

Copyright:

Available Formats

CSE4310 – Compiler Design

– A source program/code is a program/code written in the source language,

• Compiling involves performing lots of work and early

• So compilers were split up into smaller programs

• However, all utilize same basic tasks to accomplish their actions

B. Interpreter:- is a computer program that translates high level instructions/programs

C. Linker:- is a program that takes one or more objects generated by a

 Construct target program from

Phase I: Lexical Analysis (Scanning)

Semantic analyzer (type Annotated parse tree or

You might also like