You are on page 1of 35

Chapter -1

Introduction to compiler
design

Chapter – 1 : Introduction to 1 Bahir Dar Institute of


Conten
tsWhat is compiler?
 History of compilers
 Programs related to compilers/cousins of compiler
 Why Study Compilers?
 Analysis of the source program
 Phases of Compiler Design

• Scanner Intermediate Code Generator


• Parser Code Optimizer
• Semantic Analyzer Code generator
 Symbol Tables and Error Handling
 Compiler Construction Tools
Chapter – 1 : Introduction to 2 Bahir Dar Institute of
What is compiler?
 Computer’s CPU is capable of executing, very simple,primitive
operations (move, add, …)
• Recall this from your study of assembly language or computer organization

 Hence, a program for a computer must be built using machine


language
 However, this is a tedious and error-prone process
• That is why, high-level programming language are used

 Programs written in high-level languages can be very different


from the machine language
• So some means of bridging the gap is required
• This is where the compiler comes in.

Chapter – 1 : Introduction to 3 Bahir Dar Institute of


What is compiler? …
 A compiler is
• a program that translates
• a program written in a high-level programming language
(suitable for human programmers) into
• low-level machine language (that is required by computers).
• program that takes a program written in a source language and
translates it into an equivalent program in a target language.

source COMPILER
target
program program
( Normally a error ( Normally the
program written equivalent program
in a high-level messages in machine code or
programming assembly language)
language)
Chapter – 1 : Introduction to 4 Bahir Dar Institute of
History of compilers
 1940’s:
• Early stored-program computers were programmed in
machine language.
• Later, assembly languages were developed
 1950’s:
• Early high-level languages were developed, FORTRAN
• Compiler-writing was a huge task, took 18 person year for
FORTRAN compilers
 1960’s onwards/Now:
• Intensively studied
• using software tools, can be done in a few months
Chapter – 1 : Introduction to 5 Bahir Dar Institute of
Programs related to compilers (COUSINS OF COMPILER)
 There are other translators/programs that are related to or used
together with compilers and that often come together with
compilers in complete language development environment.
 As a general,Translator is a program that translates one language
to another.
 Types of Translator:
1. Interpreter 2.Compiler 3.Assembler
 Interpreters - Directly execute the operations specified in
the source program on inputs supplied by the user

• Do not produce a target program as a translation


Chapter – 1 : Introduction to 6 Bahir Dar Institute of
Programs related to compilers (COUSINS OF COMPILER)
 Compilers vs. Interpreters
• The compiler executes the entire program at a time, but the
interpreter executes each and every line individually.
• Languagesusing Compilers:FORTRAN, COBOL, C, C++, Pascal,
PL/1
• Languagesusing Interpreters: Lisp, scheme, BASIC, APL, Perl,
Python, Smalltalk
• Prosof Compilers:Fast execution (creates executable file)
• Consof Compilers:Slow processing, Debugging (Improved the
IDEs), more memory required(due to object codes)
• Prosof Interpreters: Easy debugging, Fast Development, less
memory requirement
• Consof interpreters: Not for large projects, Slower execution
Chapter – 1 : Introduction to 7 Bahir Dar Institute of
Programs related to compilers (COUSINS OF COMPILER)
Compilers: Translate a source (human-writable) program to an executable
(machine-readable) program.
Interpreters: Convert a source program and execute it at the same time.

Ideally:
Source code Compiler Executable
Input data Executable Output data

Source code
Interpreter Output data
Input data

i.e.

Chapter – 1 : Introduction to 8 Bahir Dar Institute of


Programs related to compilers (COUSINS OF COMPILER)
 Assemblers - convert a program in assembly language to its
equivalent program in machine language

 Linkers - a computer program


that takes one or more object
files generated by compilers or
assemblers and combines them
into a single executable program.

Chapter – 1 : Introduction to 9 Bahir Dar Institute of


Programs related to compilers (COUSINS OF COMPILER)
 Loaders- loads an executable into memory and starts it
running.
 Editors – programs used to write/edit source codes
 Debuggers – programs which are used to determine
execution errors in a compiled errors
 Preprocessors –
• Asource program may be divided into modules stored in
separate files.
• The task of collecting the source program is sometimes
entrusted to a separate program, called a preprocessor.
• It may also expand shorthands, called macros, into source
language statements.
Chapter – 1 : Introduction to 1 Bahir Dar Institute of
Programs related to compilers (COUSINS OF COMPILER)
 A preprocessors produce input to compilers.
 They may perform the following functions.
• 1. Macro processing: Apreprocessor may allow a user to define
macros that are short hands for longer constructs.
• 2. File inclusion: Apreprocessor may include header files into
the program text.
• 3. Rational preprocessor: these preprocessors augment older
languages with more modern flow-of control and data
structuring facilities.
• egg while-statement or if-statement if none exist in the program
itself
• 4. Language Extensions:These preprocessor attempts to add
capabilities to the language by certain amounts to build-in
macro
Chapter – 1 : Introduction to 1 Bahir Dar Institute of
Language Processing System

preprocessor
modified source program

compiler
target assembly program

assembler
relocatable machine code

linker/loader Library
files
target machine code

Chapter – 1 : Introduction to 1 Bahir Dar Institute of


Language Processing System
 Asource program may be divided into modules stored in separatefiles.
 The task of collecting the source programis sometimes entrusted to a
separateprogram,called apreprocessor.
 The preprocessor may also expand short-hands, called macros,into
source language statements
 Large programs are often compiled in pieces, so the relocatable
machine code may have to be linked together with other
relocatable object files and library files into the code that
actually runs on the machine.
 The linker resolves external memory addresses, where the
code in one file may refer to a location in another file.
 The loader then puts together all of the executable object files
into memory for execution.
Chapter – 1 : Introduction to 1 Bahir Dar Institute of
Compilers Construction related to other Computer
Science topics
Programming Languages

Data structures and Algorithms

Theory of Computation (Automata and formal language theory)

Assembly language

Software Engineering

Computer Architecture

Operating Systems and

Discrete Mathematics

Chapter – 1 : Introduction to 1 Bahir Dar Institute of


Analysis of source program
 In compiling, analysis consists of three phases:
• Linear analysis, in which the stream of characters making up the
source program is read from left-to-right and grouped into tokens
that are sequences of characters having a collective meaning.
• Hierarchical analysis, in which characters or tokens are grouped
hierarchically into nested collections with collective meaning.
• Semantic analysis, in which certain checks are performed to
ensure that the components of a program fit together
meaningfully.
• See in phase of compiler topic for detail

Chapter – 1 : Introduction to 1 Bahir Dar Institute of


Why Study Compilers?
 Compilers enable programming at a high level language
instead of machine instructions.
• Malleability, Portability, Modularity, Programmer
Productivity,
 Increases understanding of language semantics
 Seeing the machine code generated for language

constructs helps understand performance issues for


languages
 Teaches good language design

Chapter – 1 : Introduction to 1 Bahir Dar Institute of


Why Study Compilers?
 Become a better programmer
 Insight into interaction between languages, compilers, and hardware
 Fascinating blend of theory and engineering
 Direct applications of theory to practice
 Useful to develop software tools that parse computer codes
or strings
E.g., editors, debuggers, interpreters, preprocessors, …
• Important to understand how compliers work to program

more effectively
ÿ To provide solid foundation in parsing theory for parser

writing
 Resource allocation, “optimization”, etc.
 Youmight even write a compiler some day!
Chapter – 1 : Introduction to 2 Bahir Dar Institute of
Grouping of Phases into Passes /Parts of compilation
 Compiler is not a single box that maps a source program into a target program.
 There are two parts to this mapping: analysis and synthesis
• Analysis (front part) [Lexical, Syntax, and Semantic analysis]
• breaks up the source program into constituent pieces
• Creates an intermediate representation of the source program
• Reports any error detected
• Stores source program info in a data structure called a symbol table
• Machine Independent/Language Dependent. b/c
they depend primarily on the source language
• Synthesis (Back part)[Code Generation + Code Optimization]
• constructs the desired target program from the intermediate
representation and the information in the symbol table.
• Machine Dependent. b/c they depend on
the target machine/Language
independent
 Compilation process operates as a sequence of phases,
• each of which transforms one representation of the source program to
another.
Chapter – 1Intermediate
• NB: : Introduction
codeto 1
generation is betweenBahir
frontDar Institute
end and of
back end
The Phases of a Compiler…

Chapter – 1 : Introduction to 1 Bahir Dar Institute of


Lexical Analyzer (Scanner)
 Also called the Lexer
 How it works:
• Reads characters from the source program.
• Groups the characters into lexemes (sequences of characters
that "go together").
• Each lexeme corresponds to a token;
• i.e. For each lexeme, the lexical analyzer produces
as output a token of the form (token-name,
attribute-value)
• the scanner returns the next token (plus maybe some
additional information) to the parser.
• The scanner may also discover lexical errors (e.g., erroneous
characters).
• Start symbol table with new symbols found
Chapter – 1 : Introduction to 2 Bahir Dar Institute of
Lexical Analyzer (Scanner)…
 Tokens include e.g.:
• “Reserved words”: do if float while
• Special characters: ( { , + - = ! /
• Names & numbers: myValue, 3.07e02
 The definitions of what a lexeme , token or bad
character is depend on the definition of the source
language.
 Examples of tools for lexical analysis are
• Lex
• flex
 A lexeme is a sequence of characters in the source
program that is matched by the pattern for a token.

Chapter – 1 : Introduction to 2 Bahir Dar Institute of


Lexical Analyzer - Examples
 Consider the expression: sum = 3 + 2; in C programming language.
Tokenized in the table: Lexeme Token Token type
sum identifie
r
= assign Assignment
operator
3 number Integer literal
+ addition Addition operator
2 mult Integer literal
; semicol End of statement
Position _:=_ o _rate_
initial _+ n * 60_;

 Example 2: All are lexemes


 Blanks, Line breaks, etc. are scanned out
Chapter – 1 : Introduction to 2 Bahir Dar Institute of
Syntax Analyzer (Parser)
 Also known as Hierarchical Analysis/ Parsing
 Constructs a parse tree from symbols
 Apattern-matching problem
• Language grammar defined by set of rules that identify
legal (meaningful) combinations of symbols
• Each application of a rule results in a node in the parse tree
• Parser applies these rules repeatedly to the program until
leavesof parse tree are“atoms”
 If no pattern matches, it’s asyntax error

 YACC, bison are tools for this

Chapter – 1 : Introduction to 2 Bahir Dar Institute of


Syntax Analyzer - Example
 Source code:
position = initial + rate * 60;
 Abstract-syntax tree:

• interior nodes of the tree are OPERATORS;


• anode’s children are its OPERANDS;
• each sub-tree forms a logical unit .
• the sub-tree with * at its root shows that * has higher
precedence than +, the operation “rate * 60” must be
performed asaunit, not “initial + rate”.
Chapter – 1 : Introduction to 2 Bahir Dar Institute of
Semantic Analyzer
 Checks source program for semantic errors, e.g., type errors
• Annotates and/or changes the abstract syntax tree based on the attribute
grammar
• Annotate a node that represents an expression with its type.
• Example with before and after:

 The most Important activity in This Phase:


• Type Checking - the compiler checks that each operator has
operands that are permitted by the source language specification.
Chapter – 1 : Introduction to 2 Bahir Dar Institute of
Intermediate Code Generator
 Translates from abstract-syntax tree to intermediate code
 In other words, it gets input from the semantic analysis and converts the
input into output as intermediate code such as:
• 3-address code
• Each statement contains
– at most 3 operands; in addition to “:=”
• An "easy” and“universal” format that canbetranslated into most assembly
languages.
• Here's an example of 3-address code for the abstract-syntax tree shown on
the preceding slide.
– t1 = inttofloat(60)
– t2 = id3 * t1
– t3 = id2 + t2
– id1 = t3
 NB: The three-address code consists of a sequence of instructions, each of which
has at most three operands.
Chapter – 1 : Introduction to 2 Bahir Dar Institute of
Code Optimization
 Improve the efficiency of intermediate code.
• Goal may be to make code run faster , and/or to use least
number of registers
t1= intofloat(60)
t2=id3*60.0
t2=id3*t1
id1 = id2 + t2
t3=id2+t2
id1=t3

 Current trends:
• to obtain smaller, but maybe slower, equivalent code for
embedded systems;
• to reduce power consumption
• to enable parallelism
Chapter – 1 : Introduction to 2 Bahir Dar Institute of
Code Generation
 A compiler may generate
• pure machine codes (machine dependent assembly
language) directly, which is rare now ;
• virtual machine code.
 Generates object code from (optimized) intermediate
code LDF R2, id3
MULF R2, R2,#60.0
t2=id3*60.0
LDF R1, id2
id1 = id2 + t2
ADDF R1, R1, R2
STF id1, R1

Chapter – 1 : Introduction to 2 Bahir Dar Institute of


Phases of Compilers (Summary)

Chapter – 1 : Introduction to 2 Bahir Dar Institute of


Symbol Table
 Symbol table management is a part of the compiler that
interacts with several of the phases
– Identifiers and their values are found in lexical analysis and placed
in the symbol table
– During syntactical and semantic analysis, type and scope
information is added
– During code generation, type information is used to determine
what instructions to use
– During optimization, the “live analysis” may be kept in thesymbol
table
 Most suitably implemented as a dynamic data structure
(linear list, binary tree, hash table)
Chapter – 1 : Introduction to 3 Bahir Dar Institute of
Handling Errors
 Error handling and reporting also occurs across many
phases
– Lexical analyzer reports invalid character sequences
– Syntactic analyzer reports invalid token sequences
– Semantic analyzer reports type and scope errors, and the like

 The compiler may be able to continue with some


errors, but other errors may stop the process

Chapter – 1 : Introduction to 3 Bahir Dar Institute of


Compiler Construction
 Tools
Scanner Generators : Produce Lexical Analyzers
 egg. Lex (Flex)
 Parser Generators : Produce Syntax Analyzers
 Example-YACC (Yet Another Compiler-Compiler).
 Syntax-directed Translation Engines : Generate intermediate
Code egg.YACC (Bison)

 Automatic Code Generators : Generate Actual Code


 i.e. It takes a collection of rules to translate intermediate language into
machine language.
 Data-Flow Engines : Support Optimization
 Means: It does code optimization using data-flow analysis, that is, the
gathering of information about how values are transmitted from one
part of a program to each other part.
Chapter – 1 : Introduction to 3 Bahir Dar Institute of
Types of compiler
 One pass Compiler
 The compiler which completes whole

compilation process in a single pass.


 i.e., it traverse through the whole source

code only once.


 Threaded Code Compiler
 The compiler which will simply replace a

string (e.g., name of subroutine) by an


appropriate binary code.
 Incremental Compiler
 The compiler which compiles only the
changed lines from the source code and
according
update the object code
ly.
Chapter – 1 : Introduction to 3 Bahir Dar Institute of
Types of
compiler
 Stage Compiler
 A compiler which converts the code

into assembly code only.


 Just-in-time Compiler

 A compiler which converts the code


into machine code after the program
starts execution.
 Retargetable Compiler

 A compiler that can be easily modified


to compile a source code for different
CPU architectures.
 Parallelizing Compiler

A Compiler capable of compiling a code


in–parallel
Chapter computer
1 : Introduction to architecture.
3 Bahir Dar Institute of
Reading Assignment
What are the types of compiler? Discuss their differences?

You might also like