You are on page 1of 22

COMPILER DESIGN

(MODULE 1- INTRODUCTION TO LANGUAGE)


COMPILER DESIGN (MODULE 1)
1. FINITE AUTOMATA /AUTOMATA MACHINE :
Finite automata MACHINE/ FINITE STATE MACHINE are used to recognize
patterns. It takes the string of symbol as input and changes its state accordingly. When
the desired symbol is found, then the transition occurs.
1.1 DETERMINISTIC FINITE AUTOMATA :
The finite automata is said to be deterministic finite automata if the machine read an
input string one symbol at a time.
In DFA, there is only one path for specific input from the current state to the next
state & DFA can contain multiple final states. It is used in Lexical Analysis in Compiler.
1.2 NON DETERMINISTIC FINITE AUTOMATA :
The finite automata is said to be non deterministic finite automata when there exist
many paths for specific input from the current state to the next state. It is used to
transmit any number of states for a particular input. It can accept the null move.
COMPILER DESIGN
(MODULE 2- INTRODUCTION TO COMPILER)
COMPILER DESIGN (MODULE 2)
1. COMPILER DESIGN :
• A compiler is a complex computer program (or set of programs) that translates text written in a computer
language (the source language) into another computer language (the target language).
• A compiler is likely to perform many operations such as lexing, preprocessing, parsing, semantic
analysis, code optimizations, and code generation.
1.1 APPLICATIONS OF A COMPILER TECHNOLOGY & TOOLS
• Processing domain-specific and device-specific languages such as C, C++, Java etc.
• Implementing a server that uses a protocol such as http or imap
• It is widely used for Translating Programs.
• Extending an existing programming language
1.2 CHALLANGING /CHALLENGES OF COMPILER DESIGN
1.Many variations:
many programming languages (eg, C++, Java) ,many operating systems (eg,Windows,linux)
2. Qualities of a compiler (in order of importance):
• the compiler itself must be bug-free
• it must generate correct machine code
• the generated machine code must run fast
• the compiler itself must run fast
• the compiler must be portable
3.Building a compiler requires knowledge of :
• programming languages , algorithms and data structures ,software engineering.
COMPILER DESIGN (MODULE 2)
2. TYPE OF COMPILER :
The different types of compiler are as follows
3. One pass compiler 2. Two pass compiler 3. Multi pass Compiler
2.1 ONE PASS COMPILER / SINGLE PASS COMPILER .
One Pass compiler also known as Narrow Compiler is a type of compiler that passes through the source
code of each compilation unit only once and it immediately translates each code section into its final
machine code. It is used to implement programming languages such as Pascal.

2.2 MULTI PASS COMPILER .


A multi-pass compiler also known as Wide Compiler is a type of compiler that processes the source code of a
program several times. It divided a large program into multiple small programs and process them & it
develops multiple intermediate codes. It is used to implement programming languages such as Java.
COMPILER DESIGN (MODULE 2)
2. TYPE OF COMPILER :
2.2.1 MULTI PASS COMPILER .

The structure of a typical four pass compiler is shown in the above figure :
• The first pass is the preprocessor which produces input for Compiler and deals with macro processing,
augmentation, language extension, etc.
• The second pass is the heart of a compiler and it is made up of a lexical analyzer, parse and code
generator, and it translate source code into intermediate code.
• The third pass is the optimizer, which improves the quality of intermediate code.
• The fourth pass translate the optimized code to real assembly language or some form of binary
executable code.
COMPILER DESIGN (MODULE 2)
2. TYPE OF COMPILER :
2.3 ONE PASS COMPILER VS MULTI PASS COMPILER.

ONE PASS COMPILER MULTI PASS COMPILER


1. It has a limited scope It has a great scope

2. There is no code optimization There is code optimization

3. There is no intermediate code generation There is intermediate code generation

4. It takes minimum time to compile It takes some time to compile

5. Memory consumption is lower Memory consumption is higher

6. It is used to implement programming It is used to implement programming


languages such as Pascal. languages such as Java.
COMPILER DESIGN (MODULE 2)
2. TYPE OF COMPILER :
2.4 TWO PASS COMPILER
The Two pass compiler simplifies the retargeting process & it allows multiple front ends.
Two pass Compiler is divided into two sections :
(a) Front end (b) Back end

[ Note : The division of the compilation processes in phases (or passes), terms front end, middle end (rarely
heard today), and back end.]

2.4.1 FRONT END


• The front end consist of those phases, that depends primarily on source program and largely independent
of the target machine.
• It translates a computer programming source code into an intermediate representation.
• Front end includes lexical analysis, syntax analysis, semantic analysis, intermediate code generation and
a symbol table.
2.4.2 BACK END
• The back end consists of those phases, that depends on target machine and do not depend on source
program.
• It works with the intermediate representation to produce code in a computer output language.
• Back end includes code optimization and code generation phase with necessary error handling and
symbol table operation.
COMPILER DESIGN (MODULE 2)
3. MAJOR PARTS OF A COMPILER / TWO PHASES OF A COMPUTATIONAL MODEL :
There are two major parts of a compiler which are as follows :
(a) Analysis (b) Synthesis
[3.1] Analysis :
In analysis phase, an intermediate representation is created from the given Source Code/Program.
The analysis part can be divided along the following phases:
(i)Lexical Analyzer
(ii)Syntax Analyzer
(iii) Semantic Analyzer
[3.2] Synthesis
In synthesis phase , the target program is generated from the intermediate representation.
(i) Intermediate Code Generator
(ii) Code Optimizer
(iii) Final Code Generator
COMPILER DESIGN (MODULE 2)
4. PHASES OF A COMPILER :

1. Source Code/Program :
Source code is written by a programmer in some human-readable computer programming language. It is a high level
code and it is written in high level languages like C, C++, Java, Python etc
2. Lexical Analysis :
Lexical analysis is the first phase of compilation process. It takes source code as input and reads the source program
one character at a time and converts it into meaningful lexemes and represents these lexemes in the form of tokens.
3. Syntax Analysis :
Syntax analysis is the second phase of compilation process. It takes tokens as input and generates a parse tree as
output. The main aim of this phase is to make sure that the source code was written by the programmer is correct or
not.
COMPILER DESIGN (MODULE 2)
4. PHASES OF A COMPILER :
4. Semantic Analysis :
Semantic analysis is the third phase of compilation process. It checks whether the parse tree follows
the rules of language.
5. Intermediate Code Generation :
In the intermediate code generation, compiler generates the source code into the intermediate code .
6. Code Optimization :
It is used to improve the intermediate code so that the output of the program could run faster and take
less space.
7. Code Generation :
Code generation is the final stage of the compilation process and it translates the intermediate code
into the machine code of the specified computer.
8. Target Program :
The target program is the output of the code generator. The output can be: (a) Assembly Language (b)
Relocatable Machine language
COMPILER DESIGN (MODULE 2)
5. COUSINS / RELATIVES OF A COMPILER :
The Cousins / Relatives of a compiler are as follows :
(a) Preprocessor (b) Assembler (c) Loader and Link-editor.
5.1 PREPROCESSOR
Preprocessor is a program which produces input for Compiler . It deals with macro processing,
augmentation, language extension, etc.
(a) Lexical Preprocessor - Lexical preprocessors are the lowest-level of preprocessors, as they only require
lexical analysis.
(b) Synthetic preprocessor - Syntactic preprocessors are used to customize the syntax of a language, extend a
language inside a general purpose language.
5.2 ASSEMBLER
• An assembler converts assembly level language code into machine language code.
• It converts the code line by line
• It inputs assembly level code .
5.3 LOADER AND LINKER
 Loader : It loads the executable module to the main memory. It takes executable module generated by a
linker..
 Linker : It generates the executable module of a source program. It takes input of object code generated
by compiler/ assembler.
COMPILER DESIGN (MODULE 2)
6. COMPILER vs INTERPRETER :
• A compiler is a complex computer program (or set of programs) that translates text written in a computer
language (the source language) into another computer language (the target language).
• It converts the source code into object code.
• It does not require source code for later execution.
• Example : C , C++

• An interpreter is a translating program which converts high level language into machine code
• It does not convert source code into object code instead it scans it line by line
• It requires source code for later execution.
• Example : Python , MATLAB etc
COMPILER DESIGN
(MODULE 3- INTRODUCTION TO SOURCE CODE & LEXICAL
ANALYSIS)
COMPILER DESIGN (MODULE 3)
1. SOURCE CODE :
• Source code is written by a programmer in some human-readable computer programming language. It is
a high level code and it is written in high level languages like C, C++, Java, Python etc
1.1 PURPOSES
• Source code is written by a programmer in some human-readable computer programming language. It is
a high level code and it is written in high level languages like C, C++, Java, Python etc .It is input to
compiler or any other translator and it can be changed over time.
1.2 QUALITY
The quality code languages provide general syntactic criteria and syntactic elements.
(A)GENERAL SYNTACTIC CRITERIA
The primary purpose of syntax is to provide a notation for communication between the programmer and
programming language translator. It includes
(i) Readability : If underlying structure of source code and data represented by source code is apparent from
an inspection of source code text then source code is said to be readable
(ii) Writability : The syntactic features that make a program easy to write are often in conflict with these
features that make it easy to read.
(iii)Ambiguity : Ambiguity is a central problem in every language design & a language ideally provides a
unique meaning for every syntax construct.
(iv)Verifiability : Verifiability is also related to source code. Understanding programming statements is easy,
but producing correct programs is hard.
(B) SYNTHETIC ELEMENTS OF SOURCE CODE
If we use syntactic elements in source code then readability, writeability and verifiability is enhanced.
COMPILER DESIGN (MODULE 3)
1. SOURCE CODE :
1.3 THE STRUCTURE OF A PROGRAM
• We can analyze the structure of a program in four levels
1. Lexical structure level 2. Syntactic structure level 3. Contextual structure level 4. Semantic structure level
(A) Lexical structure level
The Lexical structure level is the lowest level in which computer programs are viewed as simple sequence of
lexical items called tokens. A token are the sequence of characters which represents a unit of information in
the source program. We can classify tokens as
(i)Identifiers : (ii) Keywords : (iii) Operators : (iv) Seperator : (v) Literals :
(B) Synthetic structure level
The syntactic level describes the way that program statements are constructed from tokens and it is defined in
terms of context free grammar. The best known examples are BNF (Backus Naur Form) or EBNF (Extended
Backus Naur Form).
Let us see an EBNF example:
<Sum> = <operand>
<Operator>
<Operand>
<Operand> = <number> / (<Sum>)
<Operator> = + / –
COMPILER DESIGN (MODULE 3)
1. SOURCE CODE :
1.3 THE STRUCTURE OF A PROGRAM
(C) Contextual structure level
The contextual structure level is concerned with the ‘context’ in which program statements usually contain
identifier whose value is dictated by earlier statements.
int c = a + b; //not valid because a, b, c are not defined.
int a; //valid
int b;
int c;
int c = a + b; //valid because a, b, c are defined.

2. LEXICAL ANALYSER :
2.1 Lexical Analyzer
Lexical analyzer is the first phase of compilation process. It takes source code as input and reads the source program
one character at a time and converts it into meaningful lexemes and represents these lexemes in the form of tokens.
Lexical Analysis can be implemented with the Deterministic finite Automata.
e.g. (i) If ( 5>4) then goto
Tokens – If, (, 5, >, 4, ), then, goto
(ii) For (i=0; i< 10; i++)
Tokens - for, (, I,=0,; <, 10, +,0)
2.2 Implementation & Role of Lexical Analyzer
Lexical analyzer is classified into two stages : (a) Scanner (b) Evaluator / Lex Analysis
(A)Scanner : The first stage is the scanner which is based on a finite state machine. It has information embedded in it
about the various character sequences that can be found in any of the tokens it handles.
(B)Evaluator : The second stage is the evaluator which is used to construct a token which generates a value by going
over the characters in the lexeme The type of a lexeme, along with its value, makes it a token that may be passed
to a parser.
COMPILER DESIGN (MODULE 3)
2. LEXICAL ANALYSER :
The main work / role of a lexical analyzer is
• It helps to identify token into the symbol table
• It removes white spaces and comments from the source program
• It correlates error messages with the source program
• It reads input characters from the source program
• It keeps track of line numbers.
3. BASIC TERMINOLOGIES :
3.1 Tokens :
A token are the sequence of characters which represents a unit of information in the source program.
In Lexical structure level is the lowest level in which computer programs are viewed as simple sequence of
lexical items which is also termed as tokens . A token has only a single attribute— a pointer to the symbol
table
Examples of Tokens : Keywords: for, while, if etc. ,Identifier: Variable name, function name, etc.
Operators: '+', '++', '-' etc. ,Separators: ',' ';' etc
3.2 Lexemes : A lexeme is a sequence of characters in the source program that is matched by the pattern for
a token.. It is identified by the lexical analyzer as an instance of that token. For example, English ‘run, runs,
ran and running’ are forms of the same lexeme.
3.3 Patterns :
A pattern is a rule describing the set of lexeme that represent particular token . In the case of a keyword
which uses as a token, the pattern is a sequence of characters.
COMPILER DESIGN (MODULE 3)
4. TRANSITION DIAGRAM :
 In transition diagram the boxes of flowchart are drawn as circle called as states. States are
connected by arrows called as edges An intermediate step in the construction of a lexical analyzer, we
draw a pictorial representation of token which show all the action taken by a lexical analyzer or parser is
known as translation diagram .

5. FLEX
FLEX generates a lexical analyzer in C or C++ given an input program. Flex (fast lexical analyzer generator)
is an alternative to Lex. It is frequently used with the free Bison parser generator
COMPILER DESIGN
(MODULE 4- INTRODUCTION TO SYNTAX ANALYSIS)
COMPILER DESIGN (MODULE 4)
4.1 : PARSING & ITS TYPES
Parsing takes tokens as input and generates a parse tree as output. The main aim of parsing is to make sure
that the source code written by the programmer is correct or not.
There are two types of parsing
(1) Top down parsing (2) Bottom up parsing
TOP DOWN PARSING BOTTOM UP PARSING
1. Top down approach/parsing starts evaluating the Bottom up approach starts evaluating the parse tree
parse tree from the top and move downwards for from the lowest level of the tree and move upwards
parsing other nodes. for parsing other nodes.

2. Top down parsing searches for a production rule to Bottom up parsing searches for a production rule to
be used to construct a string. be used to reduce a string

3. Top down parsing uses leftmost derivation. Bottom up parsing uses the rightmost derivation.
4.2 : PARSING & ITS TYPES
(1) Top down parsing
(1.1) Recursive descent parsing/parser : It is a common form of top-down parsing. It is called recursive
as it uses recursive procedures to process the input. Recursive descent parsing suffers from backtracking.
Backtracking means, if one derivation of a production fails, the syntax analyzer restarts the process using
different rules of same production.
COMPILER DESIGN (MODULE 4)
4.2 : PARSING & ITS TYPES
(1.1) Recursive descent parsing/parser :

(1) Top down parsing


(1.2) Non Recursive descent (parsing/parser) / LL(1) Parser  :
A top-down parser that uses a one-token lookahead is called an LL(1) parser.
Here the 1st L represents that the scanning of the Input will be done from Left to Right and the second L shows that in this
parsing technique we are going to use Left most Derivation Tree. And finally, the 1 says that it uses one look-ahead.
(1.3) LL Parser
An LL Parser accepts LL grammar. LL grammar is a subset of context-free grammar but with some restrictions to
get the simplified version.
First L of LL is for left to right and second L is for leftmost derivation. It follows the left most derivation. And Starts with
the start symbol(S).

You might also like