Professional Documents
Culture Documents
Construction
LECTURE 1
General Information
Instructor Hussain Shah
Midterm 30 marks
Final term 50
Assignments 20
Text Book Compilers – Principles,
Techniques and Tools by
Aho, Sethi and Ullman
2
Compiler general over view
We generally write a computer program using a high-level language. A high-level
language is one which is understandable by us humans. It contains words and
phrases from the English (or other) language. But a computer does not understand
high-level language. It only understands program written in 0's and 1's in binary,
called the machine code. A program written in high-level language is called a
source code.
We need to convert the source code into machine code and this is Successfully
complete by compilers and interpreters. Hence, a compiler and interpreters is a
program that converts program written in high-level language into machine code
understood by the computer.
Difference between Complier and
Interpreter
Complier translates whole program into machine code.
Interpreter translates program one statement at a time.
Complier generates the error message only after scanning the whole
program. Hence debugging is comparatively hard.
Interpreter Continues translating the program until the first error is
met, in which case it stops. Hence debugging is easy.
Programming language like Python, Ruby use interpreters.
Programming language like C, C++ use compilers.
What is Compiler
Compiler is a software that convert one form of language in to
another.
OR
Itis a type of software which is used to convert source
program (high-level language) into target program (machine
language).
The source program can be any type of traditional language
such as FORTRAN,BASIC,PASCAL.
Thetarget program is the output of the complier which is a
machine dependent.
Complier….
High-level
High-level source
source code
code
Compiler
Low-level
Low-level machine
machine code
code
Examples 13
Typical Compilers:
VC, VC++(Visual C++), GCC, Java C
FORTRAN, Pascal.
If you wrote your program in C, you
compile it with a C compiler such as
GCC.(GNU Compiler Collection)
Source Code 14
Correctness:
the generated machine code must
execute Exactly the same computation
as the source code.
How to Translate 19
Translation is a complex process
source language and generated code are very
different.
Need to structure the translation.
The Compilers can do easily transaltion or
esaily convert high level language in to low
level langauge.
Properties of Compiler
It must generate a correct executable code.
Theinput and the Output program must be equivalent, the
complier should preserve meaning of the input program.
Output program should run fast.
Compiler itself should be fast.
Compilershould provide good diagnostics for
programming errors.
Compiler time should be proportional to code size
Difference between Source Code and
Object Code
Source Code:
The set of instructions written in any language other than machine language is
called as source code.
It is not directly understood by the machine (computer).
It is in the form of text.
It is human readable.
It is generated by human (programmer).
It is input to the language translator
Object Code:
The set of instructions written in machine language is called as object code. It is
also known as machine code.
It is the only code which is directly understood by the machine (computer).
It is in the form of binary numbers.
It is machine (computer) readable.
It is generated by the language translator.
It is the output of the language translator.
Source Programing
It
is also called Output of the compiler
Machine dependent
Can be written in the following forms:
Machine Code
Assembly Language/code
Interpreter
The other typical components of this environment are editor, assembler, linker,
loader, debugger, profiler etc.
The compiler (and all other tools) must support each other for easy program
development.
If these tools have support for each other than the program development
becomes a lot easier.
Language processing system
We have learnt that any computer system is made of hardware and software. The
hardware understands a language, which humans cannot understand. So we write
programs in high-level language, which is easier for us to understand and
remember.
These programs are then fed into a series of tools and OS components to get the
desired code that can be used by the machine. This is known as Language
Processing System.
A Language processing system
The high-level language is converted into binary language in various phases.
A compiler is a program that converts high-level language to assembly language.
Similarly, an assembler is a program that converts the assembly language to
machine-level language.
Let us first understand how a program, using C compiler, is executed on a host
machine.
User writes a program in C language (high-level language).
The C compiler, compiles the program and translates it to assembly program (low-
level language).
An assembler then translates the assembly program into machine code (object).
A linker tool is used to link all the parts of the program together for execution
(executable machine code).
A loader loads all of them into memory and then the program is executed.
Before diving straight into the concepts of compilers, we should understand a few
other tools that work closely with compilers.
Explanation
Preprocessor :In computer science, a preprocessor is a program that processes its input
data to produce output that is used as input to another program. The output is said to be a
preprocessed form of the input data, which is often used by some subsequent/Next
programs like compilers.
Example of preprocessor directives : Start with #
The Compiler may produce an assembly language program/code as its output, because
assembly language is easier to produce as output and its easier to debug.
The assembly language is then processed by a program called assembler that produce
resolved machine code as its output.
You can call assembler a special type of complier.
Linker: In high level languages, some built in header files or libraries are stored.
These libraries are predefined and these contain basic functions which are
essential for executing the program. These functions are linked to the libraries by
a program called Linker. If linker does not find a library of a function then it
informs to compiler and then compiler generates an error.
Loader:- It loads the executable code into memory; program and data stack are
created, register gets initialized.
A program which loads the executable file to the primary memory of the machine.
The Structure of a Compiler (1) 47
Or Architecture
Any compiler must perform two major tasks
Compiler
Analysis Synthesis
Analysis
Synthesis
The Structure of a Compiler (2)…
The Structure of a Compiler …..
There are two parts of Compilation
1. Analysis
Known as the front-end of the compiler, the analysis phase of the
compiler reads the source program, divides it into core parts and
then checks for grammar, grammatical structure and syntax errors.
The analysis phase generates an intermediate representation of the
source program and symbol table, which should be fed to the
Synthesis phase as input.
Symbol table: Symbol table is an important data structure created
and maintained by compilers in order to store information about the
occurrence of various entities such as variable names, function
names, objects, classes, etc.
Analysis….
If the analysis part detect that the source program is syntactically ill formed or
semantically unsound then it must provide informative messages back to user in
case of errors, so the user can take corrective action.
The analysis part also collects information about the source program and stores it
in data structure called symbol table, which is passed with the intermediate
respresnataion to the synthesis part.
The Structure of a Compiler …..
2. Synthesis Phase
It is Known as the back-end of the compiler, the synthesis phase generates the
target program with the help of intermediate source code representation and
symbol table.
why we need an intermediate code….
1. Lexical Analysis
This phase is also called scanner.
This phase read the source program text from left to right and divide them in to
pieces called Tokens.
Mean that this phases take source code as a input and convert it to tokens.
Tokens: Its is the sequences of character that can be treated as a single logical
entity .
i.e Identifiers (variables ), keywords , operators ,constants
Example:
Semantic analysis checks whether the parse tree constructed follows the rules of
language.
For example, assignment of values is between compatible data types, and adding
string to an integer.
Also, the semantic analyzer keeps track of identifiers, their types and expressions;
whether identifiers are declared before use or not etc.
The semantic analysis check weather the parse tree constructed follows the rule of
language
Example
After semantic analysis the compiler generates an intermediate code of the source
code for the target machine.
The intermediate code has two properties
Easy to produce
Easy to translate in to target program
It is in between the high-level language and the machine language.
This intermediate code should be generated in such a way that it makes it easier to
be translated into the target machine code.
5) Code Optimization
In this phase, the code generator takes the optimized representation of the
intermediate code and maps it to the target machine language.
More Detail Example:
Code optimizer optimizes(in this case, reduces) the size of code if it is possible.
For above example,
temp1=id2*id3
id=id1+temp1
6. Code Generator
Code generator generates the assembly code in terms of registers from the input it
got from above steps.
For above example (suppose id1,id2,id3 are saved to registers AX,BX,CX
respectively),
Then assembly code will be
MUL BX,CX
ADD AX, CX
MOV CX, Id or a
Here BX , CX, AX are registers
Error
During the completion process each phase encounter errors and has to deal with
them. For this purpose the error handler contains error handler routines
2. Syntax Errors:
When the streams of token violates the grammar rule of a
language then it is a syntax error .
Example:
Statements not ended with semi colon(;)
S= C+/d;
I= a/-b;
Unbalanced parenthesis or curly braces
Types of Errors
3. Semantic errors
Semantic errors occurs when the operations of the source
program are not meaning full
Example:
Declaration of the same identifiers multiple times in the
same program i.e int a;
int a;
undeclared variables, double declaration of variable,
Types of Errors
4. Logical Errors : Logical error is in error in the algorithm of the
source program due to which an undesired outputs results generate.
A logical error occurs due to the poor understanding of the problem.
For instance if you want to find the modulo of a certain number
(eg: a%4) instead you wrote the program for division(eg: a/4) then
this type of error is considered to be the logical error.
For example instance 10 % 3 is 1 because 10 divided by 3 leaves a
remainder of 1.
Example:
Infinite Loop
Division by zero
Types of Errors
5. Spurious Errors:
Spurious errors are those type of error made by complier
during the error recovery .