You are on page 1of 51

COMPILER DESIGN

(CS 1703)
CHAPTER 1
COMPILER FUNDAMENTALS , COMPILER STRUCTURE
Objectives:

• Aims at offering complete knowledge on compiler design


• The development of a working compiler in parts
Topics include :
compiler structure, symbol tables, regular expressions and languages,

prepared by Chitrapriya, CSE Dept


finite automata, lexical analysis, context-free languages,
recursive descent, Parsing semantic analysis and code generation.
• This will enable the learners to use formal attributed grammars for specifying the
syntax and semantics of programming languages and their impact on compiler
design.

2
Why Should We Study Compiler Design?

1. Better understanding of programming language concepts


2. Wide applicability
Transforming “data” is very common
 Many useful data structures and algorithms

prepared by Chitrapriya, CSE Dept


3. Bring together
Data structures & Algorithms
 Formal Languages
Computer Architecture
4. Influence:
Language Design
Architecture
3
Issues Driving Compiler Design

• Correctness
• Speed (runtime and compile time)
Degrees of optimization
Multiple passes

prepared by Chitrapriya, CSE Dept


• Space
• Feedback to user
• Debugging

4
Learning outcomes

Understand the structure of compilers.


Understand the basic techniques used in compiler construction such as lexical
analysis, top- down, bottom-up parsing, context-sensitive analysis, and intermediate
code generation.
Understand the basic data structures used in compiler construction such as abstract

prepared by Chitrapriya, CSE Dept


syntax trees, symbol tables, three-address code, code optimizer and stack machines.

5
Learning outcomes

Design and implement a compiler using a software engineering approach.


Use generators (e.g. Lex and Yacc).
 Design different types of parser for a given grammar.
Be able to design their own compiler.

prepared by Chitrapriya, CSE Dept


6
Text Books to Study
• A.V. Aho, R. Sethi, J.D. Ullman, “Compilers: Principles, Techniques and Tools”,
Addison – Wesley.

prepared by Chitrapriya, CSE Dept


7
CONTENTS: COMPILER STRUCTURE
1. Introduction
2. Compiler & Interpreter
3. Compiler Vs Interpreter
4. Analysis-synthesis model of compilation

prepared by Chitrapriya, CSE Dept


5. Various phases of a compiler
6. Compiler construction tools

8
1. Introduction
• In order to reduce the complexity of designing and building computers, nearly all of
these are made to execute relatively simple commands (but do so very quickly).
• Programming languages are notations for describing computations to people and to
machines.
• All the software running on all the computers was written in some programming

prepared by Chitrapriya, CSE Dept


language.
• A program for a computer must be built by combining simple commands into a
program in what is called machine language.

9
Introduction contd..
• Since this is a tedious and error prone process most programming is, instead, done
using a high-level programming language.
• This language can be very different from the machine language that the computer
can execute, so some means of bridging the gap is required. This is where the
compiler comes in.

prepared by Chitrapriya, CSE Dept


10
2. COMPILER
• A compiler translates (or compiles) a program written in a high-level programming
language that is suitable for human programmers into the low-level machine
language that is required by computers.
• During this process, the compiler will also attempt to spot and report obvious
programmer mistakes.

prepared by Chitrapriya, CSE Dept


• An important role of the compiler is to report any errors in the source program that it
detects during the translation process.

11
SOURCE PROGRAM COMPILER TARGET PROGRAM

INPUT

prepared by Chitrapriya, CSE Dept


TARGET
SOURCE PROGRAM (
COMPILER
PROGRAM Executable Code)

OUTPUT
12

Fig1: COMPILER
Example: Compiler compiling a statement of a program

MOV id3, R2
MUL #10.0, R2
X= a+b*10 COMPILER
MOV id2, R1

prepared by Chitrapriya, CSE Dept


ADD R2, R1
MOV R1, id1

13
Why Use a high-level programming
language?
• Compared to machine language, the notation used by programming languages is
closer to the way humans think about problems.
• The compiler can spot some obvious programming mistakes.
• Programs written in a high-level language tend to be shorter than equivalent

prepared by Chitrapriya, CSE Dept


programs written in machine language.
• The same program can be compiled to many different machine languages and,
hence, be brought to run on many different machines.

14
Notes:
• Programs that are written in a high-level language and automatically translated to
machine language may run somewhat slower than programs that are hand-coded in
machine language. Hence, some time-critical programs are still written partly in
machine language.

prepared by Chitrapriya, CSE Dept


• A good compiler will, however, be able to get very close to the speed of hand-written
machine code when translating well-structured programs.

15
2. INTERPRETER
• An interpreter, like a compiler, translates high-level language into low-level machine
language.
• Instead of producing a target program as a translation, an interpreter appears to
directly execute the operations specified in the source program on inputs supplied
by the user.

prepared by Chitrapriya, CSE Dept


• An interpreter translates high-level instructions into an intermediate form, which it
then executes.

16
SOURCE PROGRAM
INTERPRETER Output

prepared by Chitrapriya, CSE Dept


INPUT

Fig 2: INTERPRETER

17
3. COMPILER Vs INTERPRETER
COMPILER INTERPRETER
The machine-language target program An interpreter, give better error
produced by a compiler is usually diagnostics than a compiler, because it
much faster than an interpreter at executes the source program statement
mapping inputs to outputs by statement

prepared by Chitrapriya, CSE Dept


A compiler reads the whole source An interpreter reads a statement from
code at once, creates tokens, checks the input, converts it to an intermediate
semantics, generates intermediate code, executes it, then takes the next
code, executes the whole program and statement in sequence.
may involve many passes
A compiler reads the whole program If an error occurs, an interpreter stops
even if it encounters several errors. execution and reports it 18
COMPILER Vs INTERPRETER
Compiler Interpreter
Scans the entire program and translates it as a whole
Translates program one statement at a time.
into machine code.

prepared by Chitrapriya, CSE Dept


It takes large amount of time to analyze the source It takes less amount of time to analyze the
code but the overall execution time is comparatively source code but the overall execution time is
faster. slower.
Generates intermediate object code which further No intermediate object code is generated,
requires linking, hence requires more memory. hence are memory efficient.
It generates the error message only after scanning Continues translating the program until the
the whole program. Hence debugging is first error is met, in which case it stops. Hence
comparatively hard. debugging is easy. 19
Programming language like Python, Ruby use
Programming language like C, C++ use compilers.
interpreters.
EXAMPLE: Java language processors
• Java language processors combine compilation and interpretation (hybrid compiler).
• A Java source program may first be compiled into an intermediate form called
bytecodes.
• The bytecodes are then interpreted by a virtual machine.

prepared by Chitrapriya, CSE Dept


• In order to achieve faster processing of inputs to outputs, some Java compilers,
called just-in-time compilers, translate the bytecodes into machine language
immediately before they run the intermediate program to process the input

20
SOURCE INTERPRETER
COMPILER OUTPUT
PROGRAM (Virtual Memory)
Bytecode Machine
Translate

prepared by Chitrapriya, CSE Dept


language
+ input

Fig 3: HYBRID COMPILER(Java Language Processing System)

21
Language-Processing System
• We have learnt that any computer system is made of hardware and software.
• The hardware understands a language, which humans cannot understand.
• So we write programs in high-level language, which is easier for us to understand and
remember.
• These programs are then fed into a series of tools and OS components to get the

prepared by Chitrapriya, CSE Dept


desired code that can be used by the machine.
• This is known as Language Processing System.

22
Source program Pre-processor

Modified Source program

Compiler

Target Assembly code

prepared by Chitrapriya, CSE Dept


Assembler

Relocatable machine code

Relocatable object
Linker/Loader
/Library files

Target machine code 23

Fig 4: Language Processing System


EXAMPLE: How a program, using C compiler, is
executed on a host machine?
1. User writes a program in C language (high-level language).
2. The C compiler, compiles the program and translates it to assembly program (low-
level language).
3. An assembler then translates the assembly program into machine code (object).

prepared by Chitrapriya, CSE Dept


4. A linker tool is used to link all the parts of the program together for execution
(executable machine code).
5. A loader loads all of them into memory and then the program is executed.

24
CONTD…
• Preprocessor
 A preprocessor, generally considered as a part of compiler, is a tool that produces input for
compilers. It deals with macro-processing, augmentation, file inclusion, language
extension, etc.
• Assembler

prepared by Chitrapriya, CSE Dept


 An assembler translates assembly language programs into machine code. The output of an
assembler is called an object file, which contains a combination of machine instructions as
well as the data required to place these instructions in memory.

25
• Linker
 Linker is a computer program that links and merges various object files together in order to
make an executable file.
 All these files might have been compiled by separate assemblers.
 The major task of a linker is to search and locate referenced module/routines in a program
and to determine the memory location where these codes will be loaded, making the
program instruction to have absolute references.

prepared by Chitrapriya, CSE Dept


• Loader
 Loader is a part of operating system and is responsible for loading executable files into
memory and execute them.
 It calculates the size of a program (instructions and data) and creates memory space for it. It
initializes various registers to initiate execution.

26
QUESTIONS
1. What is the difference between a compiler and an interpreter?
2. What are the advantages of :
(a) a compiler over an interpreter
(b) an interpreter over a compiler?

prepared by Chitrapriya, CSE Dept


3. What advantages are there to a language-processing system in which the compiler
produces assembly language rather than machine language?

27
How to convert a low level language into high
level language?
• A program that translates from a low level language to a higher level one is
a decompiler.

prepared by Chitrapriya, CSE Dept


28
4.Analysis-synthesis model of
compilation
• There are two parts of compilation:

1. Analysis
2. Synthesis Intermediate Code

prepared by Chitrapriya, CSE Dept


executable code
Program in some Front-end back-end for target
source language analysis Synthesis machine

Compiler

29

Fig 5: Analysis –Synthesis Model


Analysis/Front End

• The analysis phase of the compiler reads the source program, divides it into core
parts and then checks for lexical, grammar and syntax errors.
• It generates an intermediate representation of the source program and symbol
table, which should be fed to the Synthesis phase as input.

prepared by Chitrapriya, CSE Dept


30
Synthesis/Back- End

• It generates the target program with the help of intermediate source code
representation and symbol table.
• A compiler can have many phases and passes.
• Pass : A pass refers to the traversal of a compiler through the

prepared by Chitrapriya, CSE Dept


entire program.
• Phase : A phase of a compiler is a distinguishable stage, which
takes input from the previous stage, processes and yields output
that can be used as input for the next stage. A pass can have
more than one phase.
31
5. Various Phases of Compiler
• The compilation process is a sequence of various phases.
• Each phase takes input from its previous stage, has its own representation of source
program, and feeds its output to the next phase of the compiler.
1. Lexical Analysis

prepared by Chitrapriya, CSE Dept


2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation

32
Lexical
Analysis/Scanner
• The first phase of scanner works as a text scanner.
• This phase scans the source code as a stream of characters
and converts it into meaningful lexemes .

prepared by Chitrapriya, CSE Dept


• Lexical analyzer represents these lexemes in the form of
tokens as:

<token_name, attribute_value>

33
Lexical Analysis/Scanner
• Lexical analysis is the first phase of a compiler. It takes the modified source
code from language preprocessors that are written in the form of sentences.
The lexical analyzer breaks these syntaxes into a series of tokens, by
removing any whitespace or comments in the source code.
• If the lexical analyzer finds a token invalid, it generates an error. The lexical

prepared by Chitrapriya, CSE Dept


analyzer works closely with the syntax analyzer. It reads character streams
from the source code, checks for legal tokens, and passes the data to the
syntax analyzer when it demands.

34
Toke
n
• Lexemes are said to be a sequence of characters
(alphanumeric) in a token.
• There are some predefined rules for every lexeme to be
identified as a valid token.

prepared by Chitrapriya, CSE Dept


• These rules are defined by grammar rules, by means of a
pattern. A pattern explains what can be a token, and these
patterns are defined by means of regular expressions.
• In programming language, keywords, constants, identifiers,
strings, numbers, operators and punctuations symbols can be
considered as tokens.
35
For example
The variable declaration line in C language int value = 100;
contains the tokens:
Int (keyword), value (identifier),=(operator), 100(constant) and

prepared by Chitrapriya, CSE Dept


;(symbol)

36
Syntax Analysis/ Parsing

• It takes the token produced by lexical analysis as input and


generates a parse tree (or syntax tree).
• In this phase, token arrangements are checked against the

prepared by Chitrapriya, CSE Dept


source code grammar, i.e. the parser checks if the expression
made by the tokens is syntactically correct.

37
prepared by Chitrapriya, CSE Dept
38
Semantic Analysis/Type Checking

• Semantic analysis checks whether the parse tree constructed


follows the rules of language.
• Also, the semantic analyzer keeps track of identifiers, their

prepared by Chitrapriya, CSE Dept


types and expressions; whether identifiers are declared before
use or not etc.
• The semantic analyzer produces an annotated syntax tree as
an output.
• For example, assignment of values is between compatible data
types, and adding string to an integer.
39
int into float as intofloat
Intermediate CodeGeneration
• A compiler may construct one or more intermediate
representations, which can have a variety of forms.
• Syntax trees are a form of intermediate representation; they
are commonly used during syntax and semantic analysis.
• After semantic analysis the compiler generates an intermediate

prepared by Chitrapriya, CSE Dept


code of the source code for the target machine.
• It represents a program for some abstract machine.
• It is in between the high-level language and the machine
language.
• This intermediate code should be generated in such a way
that it makes it easier to be translated into the target 40

machine code.
Code Optimization

• The next phase does code optimization of the intermediate


code.
• Optimization can be assumed as something that removes

prepared by Chitrapriya, CSE Dept


unnecessary code lines, and arranges the sequence of
statements in order to speed up the program execution
without wasting resources (CPU, memory).

41
prepared by Chitrapriya, CSE Dept
Error handler

Fig: Compiler phases


42
EXAMPLE:

positions = initial +rate * 60

prepared by Chitrapriya, CSE Dept


43
Translation of an assignment statement

prepared by Chitrapriya, CSE Dept

44
Symbol Table

• It is a data-structure maintained throughout all the phases of


a compiler.
• All the identifier's names along with their types are stored

prepared by Chitrapriya, CSE Dept


here.
• The symbol table makes it easier for the compiler to quickly
search the identifier record and retrieve it.
• The symbol table is also used for scope management.

45
6. Compiler-Construction Tools
• The compiler writer, like any programmer, can profitably use
tools such as
• Debuggers, language editor
• Version managers,

prepared by Chitrapriya, CSE Dept


• Profilers and so on.
• In addition to these software-development tools, other more
specialized tools have been developed for helping implement
various phases of a compiler.

46
Contd…
• These tools use specialized languages for specifying and implementing the
component, and many use algorithms that are quite sophisticated.
• The most successful tools are those that hide the details of the generation
algorithm and produce components that can be easily integrated into the

prepared by Chitrapriya, CSE Dept


remainder of a compiler.

47
• The following is a list of some useful compiler-construction tools:
• Parser generators
• Scanner generators
• Syntax directed translation engines
• Automatic code generators

prepared by Chitrapriya, CSE Dept


• Data-flow engines

48
Compiler-Construction Tools:
• Parser generators
automatically produce syntax analyzers from a grammatical description
of a programming language.(CFG)
• Scanner generators

prepared by Chitrapriya, CSE Dept


produce lexical analyzers from a regular-expression description of the
tokens of a language.
• Syntax-directed translation engines
produce collections of routines for walking a
parse tree and generating intermediate code.
49

41
Compiler-Construction Tools: Contd…
• Automatic code generators:
produce a code generator from a collection of rules for translating each
operation of the intermediate language into the machine language for a
target machine

prepared by Chitrapriya, CSE Dept


• Compiler-construction toolkits:
It provides an integrated set of routines for constructing various phases
of a compiler.
• Data-flow engines:
facilitate the gathering of information about how values are transmitted
from one part of a program to each other part.
Data-flow analysis is a key part of code optimization 50

42
THANK YOU

prepared by Chitrapriya, CSE Dept


51

You might also like