You are on page 1of 32

Principles of Compiler Design

(SENG471)

Chapter One

Introduction

1
Preliminaries Required

• Basic knowledge of Programming languages.

• Basic knowledge of Automata and Context Free Grammar.

Textbook:

Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman,

“Compilers: Principles, Techniques, and Tools”

Addison-Wesley, 2007.
Objective
At the end of this session students will be able to:

 Understand the basic concepts and principles of Compiler Design

 Understand the term compiler, its functions and how it works.

 Be familiar with the different classification of compilers.

 Be familiar with cousins of compiler: Linkers, Loaders, Interpreters, Assemblers

 Understand the need of studying compiler Design and construction

 Understand the phases of compilation and the steps of compilation

 Understand the history of compilers. (Assignment For G1 Students)

 Be familiar with the different compiler tools (Assignment For G2 Students)

3
Introduction
Definition

 Compiler is an executable program that can read a program in one high-level language and

translate it into an equivalent executable program in machine language.

 A compiler is a computer program that translates an executable program in a source language into

an equivalent program in a target language.

A source program/code is a program/code written in the source language, which is usually a

high-level language.

A target program/code is a Source Target


program Compiler
program
program/code written in the target
language, which often is a machine
Error
language or an intermediate code message
4
(Object Code).
Contd.
 As a discipline compiler design involves multiple computer science and

Engineering courses like:


Programming Languages

Data structures and Algorithms

Theory of Computation (Automata and formal language theory)

Assembly language

Software Engineering

Computer Architecture

Operating Systems and

Discrete Mathematics

5
Why Study Theory of Compiler?
 Curiosity

 Prerequisite for developing advanced compilers, which continues to be active as

new computer architectures emerge

To improve capabilities of existing compiler/interpreter


To write more efficient code in a high-level language
 Useful to develop software tools that parse computer codes or strings

E.g., editors, debuggers, interpreters, preprocessors, …


 Important to understand how compliers work to program more effectively

 To provide solid foundation in parsing theory for parser writing

 To make compiler design as an excellent “capstone” project

 To apply almost all of the major computer science fields, such as: automata theory,

computer programming, programming language design theory, assembly language,


6
computer architecture, data structures, algorithms, and software engineering.
Classification of Compilers
o Classifying compilers by number of
 Compilers Viewed from Many Perspectives
passes has its background in the
Single Pass hardware resource limitations of
computers.
Multiple Pass Construction
o Compiling involves performing lots of

Load & Go work and early computers did not have


enough memory to contain one program
that did all of this work.
Debugging  So compilers were split up into
Functional
smaller programs which each
Optimizing
made a pass over the source (or
some representation of it)
performing some of the required
analysis and translations.
 However, all utilize same basic tasks to accomplish their actions
7
Contd.
1. Single(One) Pass Compilers:- is a compiler that passes through the source code of
each compilation unit only once
Also called narrow compilers.
The ability to compile in a single pass has classically been seen as a benefit because
it simplifies the job of writing a compiler.
Single-pass compilers generally perform compilations faster than multi-pass
compilers.
Due to the resource limitations of early systems, many early languages were
specifically designed so that they could be compiled in a single pass (e.g., Pascal).

Disadvantage of single pass Compilers:


It is not possible to perform many of the sophisticated optimizations needed to
generate high quality code.
It can be difficult to count exactly how many passes an optimizing compiler
8
makes.
Contd.
2. Multi-Pass Compilers:- is a type of compiler that processes the source code
or abstract syntax tree of a program several times.
Also called wide compilers.
Phases are separate "Programs", which run sequentially
Here, by splitting the compiler up into small programs, correct programs
will be produced.
Proving the correctness of a set of small programs often requires less
effort than proving the correctness of a larger, single, equivalent
program.
Many programming languages cannot be represented with a single pass
compilers, for example most latest languages like Java require a multi-
9 pass compiler.
Contd.
3. Load and Go Compilers:- generates machine code & then immediately
executes it.

Compilers usually produce either absolute code that is executed

immediately upon conclusion of the compilation or object code that is

transformed by a linking loader into absolute code.

These compiler organizations will be called Load & Go and

Link/Load.

Both Load & Go and Link/Load compilers use a number of passes to

translate the source program into absolute code.

10 A pass reads some form of the source program, transforms it into an


Contd.
4. Optimizing Compilers:- is a compiler that tries to minimize or maximize

some attributes of an executable computer program.

The most common requirement is to minimize the time taken to execute a

program; a less common one is to minimize the amount of memory occupied.

The growth of portable computers has created a market for minimizing the

power consumed by a program.

Compiler optimization is generally implemented using a sequence of

optimizing transformations, algorithms which take a program and

transform it to produce a semantically equivalent output program that

uses fewer resources.

Types of Optimization includes Peephole optimizations, local optimization,


11
global optimization, loop optimization, machine-code optimization, etc.
Cousins of Compilers
A. Assembler:- is a translator that converts programs written in assembly language
into machine code.
 Translate mnemonic operation codes to their machine language equivalents.
 Assigning machine addresses to symbolic labels.

B. Interpreter:- is a computer program that translates high level


instructions/programs into machine code as they are encountered.
 It produces output of statement as they are interpreted
 It generally uses one of the following strategies for program execution:
i. execute the source code directly
ii. translate source code into some efficient intermediate representation and
immediately execute this
iii. explicitly execute stored precompiled code made by a compiler which is part of the
interpreter system

12
source program
Contd.
preprocessor
modified source program
compiler
target assembly program

assembler

Relocatable machine code

linker/loader Library
files
target machine code
C. Linker:- is a program that takes one or more objects generated by a
compiler and combines them into a single executable program.
D. Loader:- is the part of an operating system that is responsible for loading
programs from executables (i.e., executable files) into memory, preparing
13 them for execution and then executing them.
Compiler vs. Interpreter

Ideal concept:
Source code Compiler Executable
Input data
Executable Output data

Source code
Interpreter Output data
Input data

 Most languages are usually thought of as using either one or the other:

 Compilers: FORTRAN, COBOL, C, C++, Pascal, PL/1

 Interpreters: Lisp, scheme, BASIC, APL, Perl, Python, Smalltalk


14 
BUT: not always implemented this way
Basic Compiler Design
 Write a huge program that takes as input another program in the source
language for the compiler, and gives as output an executable that we can
run.
For modifying code easily, usually, we use modular design
(decomposition) methodology to design a compiler.
Two design strategies:
1. Write a “front end” of the compiler (i.e. the lexer, parser, semantic
analyzer, and assembly tree generator), and write a separate back end for
each platform that you want to support
2. Write an efficient highly optimized back end, and write a different front
end for several languages, such as Fortran, C, C++, and Java.
Source Intermediate Target
code
Front End code
Back End
code
15
The Analysis-Synthesis Model of Compilation
There are two parts to compilation: analysis & synthesis.
 During analysis, the operations implied by the source program are

determined and recorded in a hierarchical structure called a tree.


 During synthesis, the operations are involved in producing translated code.

1. Lexical Analysis  Breaks up source program into


Front

2. Syntax Analysis Analysis


End

constituent pieces
3. Semantic Analysis  Creates intermediate representation of
source program

 Construct target program from


4. Code
Back
End

Generation Synthesis intermediate representation


5. Optimization  Takes the tree structure and translates

16 the operations into the target program


Analysis
 In compiling, analysis has three phases:
1. Linear analysis: stream of characters read from left-to-right and grouped into

tokens; known as lexical analysis or scanning


 Converting input text into stream of known objects called tokens.
 It simplifies parsing process

2. Hierarchical analysis: tokens grouped hierarchically with collective meaning;

known as parsing or syntax analysis


 Translating code to rules of grammar.
 Building representation of code.

3. Semantic analysis: check if the program components fit together meaningfully


 Checks source program for semantic errors
 Gathers type information for subsequent code generation (type checking)
17  Identifies operator and operands of expressions and statements
Phases of Compilation

Stream of characters

scanner
Stream of tokens
parser
Parse/syntax tree
Semantic analyzer
Annotated tree
Intermediate code generator
General Structure of a Compiler Intermediate code
Code optimization
Intermediate code
Code generator
Target code
Code optimization
18
Target code
Phase I: Lexical Analysis
 The low-level text processing portion of the compiler
 The source file, a stream of characters, is broken into larger chunks called
token.
For example:
void main() It will be broken into 13 tokens as
{
int x; below:
x=3; void main ( ) { int x ; x = 3 ; }
}
 The lexical analyzer (scanner) reads a stream of characters and puts them
together into some meaningful (with respect to the source language) units
called tokens.
 Typically, spaces, tabs, end-of-line characters and comments are ignored by

19 the lexical analyzer.



Phase II: Parsing (Syntax Analysis)
A parser gets a stream of tokens from the scanner, and determines if the syntax
(structure) of the program is correct according to the (context-free) grammar of
the source language.
 Then, it produces a data structure, called a parse tree or an abstract syntax tree,
which describes the syntactic structure of the program.
 The parser ensures that the sequence of tokens returned by the lexical
analyzer forms a syntactically correct program
 It also builds a structured representation of the program called an abstract
syntax tree that is easier for the type checker to analyze than a stream of
tokens
 It catches the syntax errors as the statement below:
if if (x > 3) then x = x + 1
 Context-free grammars will be used (as the input) by the parser generator to
20 describe the syntax of the compiling language
Parse Tree
 Is output of parsing that shows the Top-down description of program syntax
 Root node is entire program and leaves are tokens that were identified during lexical

analysis
 Constructed by repeated application of rules in Context Free Grammar (CFG)
 Syntax structures are analyzed by DPDA (Deterministic Push Down Automata)
 Example: parse tree for position:=initial + rate*60

21
Phase III: Semantic Analysis
 It gets the parse tree from the parser together with information about some
syntactic elements
 It determines if the semantics (meanings) of the program is correct.
 It detects errors of the program, such as using variables before they are
declared, assign an integer value to a Boolean variable, …
 This part deals with static semantic.
 semantic of programs that can be checked by reading off from the
program only.
 syntax of the language which cannot be described in context-free
grammar.
 Mostly, a semantic analyzer does type checking (i.e. Gathers type information
for subsequent code generation.)
22
 It modifies the parse tree in order to get that (static) semantically correct code
Contd.
 The main tool used by the semantic analyzer is a symbol table
 Symbol table:- is a data structure with a record for each identifier and its
attributes
 Attributes include storage allocation, type, scope, etc
 All the compiler phases insert and modify the symbol table
 Discovery of meaning in a program using the symbol table
 Do static semantics check
 Simplify the structure of the parse tree ( from parse tree to abstract syntax tree
(AST) )
Static semantics check
 Making sure identifiers are declared before use
 Type checking for assignments and operators
23  Checking types and number of parameters to subroutines
Phase IV: Intermediate Code Generation
 An intermediate code generator

 takes a parse tree from the semantic analyzer


 generates a program in the intermediate language.
 In some compilers, a source program is translated into an intermediate code

first and then the intermediate code is translated into the target language.
 In other compilers, a source program is translated directly into the target

language.

 Compiler makes a second pass over the parse tree to produce the translated

code
 If there are no compile-time errors, the semantic analyzer translates the
24 abstract syntax tree into the abstract assembly tree
Contd.

 Using intermediate code is beneficial when compilers which translates a

single source language to many target languages are required.

 The front-end of a compiler:- scanner to intermediate code generator can

be used for every compilers.

 Different back-ends:- code optimizer and code generator is required for

each target language.

 One of the popular intermediate code is three-address code.

 A three-address code instruction is in the form of x = y op z.

25
Phase V: Assembly Code Generation

 Code generator coverts the abstract assembly tree into the actual assembly

code

 To do code generation

 The generator covers the abstract assembly tree with tiles (each tile

represents a small portion of an abstract assembly tree) and

 Output the actual assembly code associated with the tiles that we used to cover
Phase VI: Machine Code Generation and Linking
the tree
 The final phase of compilation coverts the assembly code into machine code

and links (by a linker) in appropriate language libraries


26
Code Optimization
 Replacing an inefficient sequence of instructions with a better sequence of

instructions.
 Sometimes called code improvement.
 Code optimization can be done:
 after semantic analyzing
performed on a parse tree
 after intermediate code generation
performed on a intermediate code
 after code generation
performed on a target code

 Two types of optimization


1. Local
27
2. Global
Local Optimization

 The compiler looks at a very small block of instructions and tries to determine

how it can improve the efficiency of this local code block

 Relatively easy; included as part of most compilers

Examples of possible local optimizations

1. Constant evaluation

2. Strength reduction

3. Eliminating unnecessary operations

28
Global Optimization

 The compiler looks at large segments of the program to decide how to improve

performance

 Much more difficult; usually omitted from all but the most sophisticated and

expensive production-level “optimizing compilers”

 Optimization cannot make an inefficient algorithm efficient

29
The Phases of a Compiler
Phase Output Sample
Programmer (source code producer) Source string A=B+C;
Scanner (performs lexical analysis) Token string ‘A’, ‘=’, ‘B’, ‘+’, ‘C’, ‘;’
And symbol table with names

Parser (performs syntax analysis based Parse tree or abstract syntax tree ;
on the grammar of the programming |
language) =
/\
A +
/\
B C
Semantic analyzer (type checking, etc) Annotated parse tree or abstract
syntax tree

Intermediate code generator Three-address code, quads, or RTL int2fp B t1


+ t1 C t2
:= t2 A

Optimizer Three-address code, quads, or RTL int2fp B t1


+ t1 #2.3 A

Code generator Assembly code MOVF #2.3,r1


ADDF2 r1,r2
MOVF r2,A
30
Summary of Phases of Compiler

31
Compiler Construction Tools

Software development tools are available to implement one or more compiler phases
 Scanner generators Other compiler tools:
 JavaCC, a parser generator for Java, including
 Parser generators. scanner generator and parser generator. Input
specifications are different than those suitable for
 Syntax-directed translation engines
Lex/YACC. Also, unlike YACC, JavaCC generates
 Automatic code generators a top-down parser.
 ANTLR, a set of language translation tools
 Data Flow Engines (formerly PCCTS). Includes scanner/parser
generators for C, C++, and Java.

 Scanner generators for C/C++: Flex, Lex.


 Parser generators for C/C++: Bison, YACC.
 Available scanner generators for Java:
 JLex, a scanner generator for Java, very similar to Lex.
 JFlex, flex for Java.
 Available parser generators for Java:
 CUP, a parser generator for Java, very similar to YACC.
 BYACC/J, a different version of Berkeley YACC for Java. It is an extension of
the standard YACC (a -j flag has been added to generate Java code).
32

You might also like