You are on page 1of 84

Compiler

Construction
LECTURE 1
General Information
Instructor Hussain Shah
Midterm 30 marks
Final term 50
Assignments 20
Text Book Compilers – Principles,
Techniques and Tools by
Aho, Sethi and Ullman

2
Compiler general over view
 We generally write a computer program using a high-level language. A high-level
language is one which is understandable by us humans. It contains words and
phrases from the English (or other) language. But a computer does not understand
high-level language. It only understands program written in 0's and 1's in binary,
called the machine code. A program written in high-level language is called a
source code.
 We need to convert the source code into machine code and this is Successfully
complete by compilers and interpreters. Hence, a compiler and interpreters is a
program that converts program written in high-level language into machine code
understood by the computer.
Difference between Complier and
Interpreter
 Complier translates whole program into machine code.
 Interpreter translates program one statement at a time.
 Complier generates the error message only after scanning the whole
program. Hence debugging is comparatively hard.
 Interpreter Continues translating the program until the first error is
met, in which case it stops. Hence debugging is easy.
 Programming language like Python, Ruby use interpreters.
 Programming language like C, C++ use compilers.
What is Compiler
 Compiler is a software that convert one form of language in to
another.
OR
 Itis a type of software which is used to convert source
program (high-level language)  into target program (machine
language).
 The source program can be any type of traditional language
such as FORTRAN,BASIC,PASCAL.
 Thetarget program is the output of the complier which is a
machine dependent.
Complier….

 The target program can be in the different forms. i.e


 Machine language / code
 Assembly Language
History of Complier

 First Complier was designed around 1950.


 The aim of the first compiler is to translate arithmetic
formula to machine code.
 The name of the first complier was FORTRAN
Additional task of Complier

 The Additional task of compiler is reporting presence of


error in the source program.
Why Build compiler 9
 Compiler provide an essential interface
b/w application and architecture or
machine.
 Enable us to construct an effective
programming application written in high
level programming languages.
 Increase the productivity of programmer.
Why Build compiler 10
 Utilize lower level architecture in an effective
way.
 Program Can be portable .
 Provide better maintenance.
 Bridge the gap b/w HL and LL machine
code.
 Provide shield the application developer
from low level language details.
What are Compilers 11

Translate information from one


representation to another
Usually information = program

Translate high level language into Target


language.
Typical Compilation 12

High-level
High-level source
source code
code

Compiler

Low-level
Low-level machine
machine code
code
Examples 13

Typical Compilers:
VC, VC++(Visual C++), GCC, Java C
FORTRAN, Pascal.
If you wrote your program in C, you
compile it with a C compiler such as
GCC.(GNU Compiler Collection)
Source Code 14

int expr( int n )


{
int d;
d = 4*n*n*(n+1)*(n+1);
return d;
}
Source Code 15
Optimized for human readability.
Matcheshuman notions of
grammar/formulaies .
Uses named constructs such as variables
and procedures.
Assembly Code 16
.globl _expr
_expr:
imull %eax,%edx
pushl %ebp
movl 8(%ebp),%eax
movl %esp,%ebp
incl %eax
subl $24,%esp
imull %eax,%edx
movl 8(%ebp),%eax
movl %edx,-4(%ebp)
movl %eax,%edx
movl -4(%ebp),%edx
leal 0(,%edx,4),%eax
movl %edx,%eax
movl %eax,%edx
jmp L2
imull 8(%ebp),%edx
.align 4
movl 8(%ebp),%eax
L2:
incl %eax
leave
ret
Assembly Code 17

 Optimized for hardware.


Consists of machine instructions,
Uses registers and unnamed memory
locations.
Much harder to understand by humans.
How to Translate 18

Correctness:
the generated machine code must
execute Exactly the same computation
as the source code.
How to Translate 19
Translation is a complex process
source language and generated code are very
different.
Need to structure the translation.
The Compilers can do easily transaltion or
esaily convert high level language in to low
level langauge.
Properties of Compiler
 It must generate a correct executable code.
 Theinput and the Output program must be equivalent, the
complier should preserve meaning of the input program.
 Output program should run fast.
 Compiler itself should be fast.
 Compilershould provide good diagnostics for
programming errors.
 Compiler time should be proportional to code size
Difference between Source Code and
Object Code
 Source Code:
 The set of instructions written in any language other than machine language is
called as source code.
 It is not directly understood by the machine (computer).
 It is in the form of text.
 It is human readable.
 It is generated by human (programmer).
 It is input to the language translator
 Object Code:
 The set of instructions written in machine language is called as object code. It is
also known as machine code.
 It is the only code which is directly understood by the machine (computer).
 It is in the form of binary numbers.
 It is machine (computer) readable.
 It is generated by the language translator.
 It is the output of the language translator.
Source Programing

These are high level programming languages


such as Pascal , C++,Java,
Target Programming

 It
is also called Output of the compiler
 Machine dependent
 Can be written in the following forms:
 Machine Code
 Assembly Language/code
Interpreter

 Interpreters are not much different than compilers.


 They also convert the high level language into
machine readable binary equivalents.
 Each part of the code is interpreted and then execute
separately in a sequence and an error is found in a
part of the code it will stop the interpretation of the
code without translating the next set of the codes.
Interpreter

Takes one line of instruction of source program


and executes immediately its code.
Interpreter are slower then complier because one
instruction is translated and goes until end of the
program . object code is not stored and reused.
So it is 20 time slower then complier.
Interpreter

 It can identify only one error at a time.


General Overview
The big picture
40
 Compiler is part of program development environment

 The other typical components of this environment are editor, assembler, linker,
loader, debugger, profiler etc.

 The compiler (and all other tools) must support each other for easy program
development.
 If these tools have support for each other than the program development
becomes a lot easier.
Language processing system

 We have learnt that any computer system is made of hardware and software. The
hardware understands a language, which humans cannot understand. So we write
programs in high-level language, which is easier for us to understand and
remember.

 These programs are then fed into a series of tools and OS components to get the
desired code that can be used by the machine. This is known as Language
Processing System.
A Language processing system
 The high-level language is converted into binary language in various phases.
 A compiler is a program that converts high-level language to assembly language.
 Similarly, an assembler is a program that converts the assembly language to
machine-level language.
 Let us first understand how a program, using C compiler, is executed on a host
machine.
 User writes a program in C language (high-level language).
 The C compiler, compiles the program and translates it to assembly program (low-
level language).
 An assembler then translates the assembly program into machine code (object).
 A linker tool is used to link all the parts of the program together for execution
(executable machine code).
 A loader loads all of them into memory and then the program is executed.
 Before diving straight into the concepts of compilers, we should understand a few
other tools that work closely with compilers.
Explanation
 Preprocessor :In computer science, a preprocessor is a program that processes its input
data to produce output that is used as input to another program. The output is said to be a
preprocessed form of the input data, which is often used by some subsequent/Next
programs like compilers.
Example of preprocessor directives : Start with #

 The Compiler may produce an assembly language program/code as its output, because
assembly language is easier to produce as output and its easier to debug.
 The assembly language is then processed by a program called assembler that produce
resolved machine code as its output.
 You can call assembler a special type of complier.
 Linker: In high level languages, some built in header files or libraries are stored.
These libraries are predefined and these contain basic functions which are
essential for executing the program. These functions are linked to the libraries by
a program called Linker. If linker does not find a library of a function then it
informs to compiler and then compiler generates an error.
 Loader:- It loads the executable code into memory; program and data stack are
created, register gets initialized.
 A program which loads the executable file to the primary memory of the machine.
The Structure of a Compiler (1) 47

Or Architecture
Any compiler must perform two major tasks

Compiler

Analysis Synthesis

 Analysis
 Synthesis
The Structure of a Compiler (2)…
The Structure of a Compiler …..
 There are two parts of Compilation
1. Analysis
Known as the front-end of the compiler, the analysis phase of the
compiler reads the source program, divides it into core parts and
then checks for grammar, grammatical structure and syntax errors.
The analysis phase generates an intermediate representation of the
source program and symbol table, which should be fed to the
Synthesis phase as input.
 Symbol table: Symbol table is an important data structure created
and maintained by compilers in order to store information about the
occurrence of various entities such as variable names, function
names, objects, classes, etc. 
Analysis….
 If the analysis part detect that the source program is syntactically ill formed or
semantically unsound then it must provide informative messages back to user in
case of errors, so the user can take corrective action.

 The analysis part also collects information about the source program and stores it
in data structure called symbol table, which is passed with the intermediate
respresnataion to the synthesis part.
The Structure of a Compiler …..

2. Synthesis Phase
It is Known as the back-end of the compiler, the synthesis phase generates the
target program with the help of intermediate source code representation and
symbol table.
why we need an intermediate code….

A source code can directly be translated into its target


machine code, then why at all we need to translate the
source code into an intermediate code which is then
translated to its target code? Let us see the reasons why we
need an intermediate code.
Why we need an intermediate code..

What is Intermediate representation ?


In Intermediate representation (IR) is the data structure or code used internally by
a compiler  to represent source code. An IR is designed to be conducive for further
processing, such as optimization and translation. A "good" IR must be accurate –
capable of representing the source code without loss of information.
Why we need an intermediate code..

 If a compiler translates the source language to its target


machine language without having the option for
generating intermediate code, then for each new machine,
a full native compiler is required.
 Intermediate code eliminates the need of a new full
compiler for every unique machine by keeping the analysis
portion same for all the compilers.
Why we need an intermediate code..

 The second part of compiler, synthesis, is changed


according to the target machine.
 Itbecomes easier to apply the source code modifications to
improve code performance by applying code optimization
techniques on the intermediate code.
 Code Optimization is a program transformation technique,
which tries to improve the code by making it consume less
resources (i.e. CPU, Memory) and deliver high speed.
Intermediate Representation

 Intermediate codes can be represented in a variety of ways and they


have their own benefits.
 High Level IR - High-level intermediate code representation is very
close to the source language itself. They can be easily generated
from the source code and we can easily apply code modifications to
enhance performance. But for target machine optimization, it is less
preferred.
 Low Level IR - This one is close to the target machine, which
makes it suitable for register and memory allocation, instruction set
selection, etc. It is good for machine-dependent optimizations.
Phases of Compiler
Phases of Compiler
Six Phases of complier

1. Lexical Analysis
 This phase is also called scanner.
 This phase read the source program text from left to right and divide them in to
pieces called Tokens.
 Mean that this phases take source code as a input and convert it to tokens.
 Tokens: Its is the sequences of character that can be treated as a single logical
entity .
 i.e Identifiers (variables ), keywords , operators ,constants
Example:

Note: List of tokens can be stored in symbol table


2)Syntax Analysis

 The next phase is called the syntax analysis or parsing.


 It takes the token produced by lexical analysis as input and generates a parse tree
(or syntax tree).
 In this phase, token arrangements are checked against the source code grammar,
i.e. the parser checks if the expression made by the tokens is syntactically correct.
 Parser: A computer program that divides code up into functional components
 Two types of parsing
 Top down parsing
 Bottom up parsing
Example:
3)Semantic Analysis

 Semantic analysis checks whether the parse tree constructed follows the rules of
language.
 For example, assignment of values is between compatible data types, and adding
string to an integer.
 Also, the semantic analyzer keeps track of identifiers, their types and expressions;
whether identifiers are declared before use or not etc.
 The semantic analysis check weather the parse tree constructed follows the rule of
language
Example

The first error is semantic error i.e we make


integer type variable and store variable ‘n’ so its
not follow rules..so this error will show by this
phase
The second error is syntax error
4)Intermediate code generation

 After semantic analysis the compiler generates an intermediate code of the source
code for the target machine.
 The intermediate code has two properties
 Easy to produce
 Easy to translate in to target program
 It is in between the high-level language and the machine language. 
 This intermediate code should be generated in such a way that it makes it easier to
be translated into the target machine code.
5) Code Optimization

 The next phase does code optimization of the intermediate


code.
 Optimization can be assumed as something that removes
unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution
without wasting resources (CPU, memory).
Example
After optimization

Note: Our code is optimized unnecessary statements are skipped


6) Code Generation

 In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language.
More Detail Example:

  Now we will see six phases of compiler with an example.


 Let us take one statement which is common in many High Level Languages.
 x= a+b*c
 Here, x,a,b,c are identifiers and =,+,* are operators.
1. Lexical Analyzer

 For above example,


id=id1+id2*id3 
 (For simplicity, we have mentioned like this.
 We can also write it as id op id1 op1 id2 op2 id3 where id,id1,id2,id3 represents
identifiers and op,op1,op2 represents operators).
2. Syntax Analyzer(Parser)

  For above example, 


3. Semantic Analyzer

 Semantic analyzer checks the semantics of a expression(statement). This means, it


checks whether all the entities in the statement are as per the rules or not.
 For example, we can not assign the new value to the constant or constant value
should not be on the left hand side of a = operator. Such kinds of rules are
checked by the semantic analyzer.
 Above example, is semantically correct.
4. Intermediate Code generator

  Intermediate code generator, generates code in terms of temporary variables.


 For above example, intermediate code will be as follows,
temp1=id2*id3
temp2=id1+temp1
id=temp2
5. Code Optimizer 

 Code optimizer optimizes(in this case, reduces) the size of code if it is possible.
 For above example,
temp1=id2*id3
id=id1+temp1
6. Code Generator

 Code generator generates the assembly code in terms of registers from the input it
got from above steps.
 For above example (suppose id1,id2,id3 are saved to registers AX,BX,CX
respectively),
 Then assembly code will be
 MUL BX,CX 
 ADD AX, CX 
 MOV CX, Id or a
 Here BX , CX, AX are registers
 Error

 It is an abnormal condition in the source program either stops the compilation or


results in undesired output .
 Error Handler: The tasks of the Error Handling process are to detect each error,
report it to the user, and then make some recover strategy and implement them to
handle error. During this whole process processing time of program should not be
slow. An Error is the blank entries in the symbol table.
The two basic tasks of complier
 Error Detection
 Error Recovery
Error Handler

 During the completion process each phase encounter errors and has to deal with
them. For this purpose the error handler contains error handler routines

 Error handler routines


 When an abnormal termination occurs, your complier must be able to provide
sufficient recovery to insure that the error condition does not cause the abnormal
termination of a the source code
Types of Errors

 There are five types of error.


1. Lexical Errors:
When the remaining characters in the input do not found a
valid token then this type of error is called lexical errors.
e.g., badly formed identifiers or constants, symbols which
are not part of the language, badly formed comments,
This includes misspellings of identifiers, keywords etc.
Types of Errors

2. Syntax Errors:
When the streams of token violates the grammar rule of a
language then it is a syntax error .
Example:
 Statements not ended with semi colon(;)
 S= C+/d;
 I= a/-b;
 Unbalanced parenthesis or curly braces
Types of Errors

3. Semantic errors
Semantic errors occurs when the operations of the source
program are not meaning full
Example:
 Declaration of the same identifiers multiple times in the
same program i.e int a;
int a;
 undeclared variables, double declaration of variable,
Types of Errors
4. Logical Errors : Logical error is in error in the algorithm of the
source program due to which an undesired outputs results generate.
 A logical error occurs due to the poor understanding of the problem.
For instance if you want to find the modulo of a certain number
 (eg: a%4) instead you wrote the program for division(eg: a/4)  then
this type of error is considered to be the logical error.
  For example instance 10 % 3 is 1 because 10 divided by 3 leaves a
remainder of 1.
Example:
 Infinite Loop
 Division by zero
Types of Errors
5. Spurious Errors:
Spurious errors are those type of error made by complier
during the error recovery .

e.g If we use a function fi () but during error recovery


compiler automatically correct it as
if () .so it will create problems in program .
The Books

You might also like