You are on page 1of 42

Introduction to Compiler

By,

Dr. Anisha P Rodrigues


Associate Professor
Dept of CSE
NMAMIT, Nitte

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 1


Reference:
Compilers : Principles, Techniques and Tools
Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 2


What is a Compiler?
A compiler is a program that can read a program in one
language - the source language - and translate it into an
equivalent program in another language - the target
language

Error Messages

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 3


What is a Compiler?
Compilation has 2 parts
1. Analysis
2. Synthesis

• Analysis part : Breaks up the source program into


constituent pieces create an intermediate
representation of the source program

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 4


What is a Compiler?
Analysis part : This intermediate representation is
a hierarchical structure called syntax tree

Syntax tree for

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 5


What is a Compiler?
Synthesis part :
• The synthesis part constructs the desired target
program from the intermediate representation and

the information in the symbol table.

The analysis part is often called the front end of the


compiler; the synthesis part is the back end.

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 6


What is a Compiler?
Analysis has 3 phases:
1. Lexical Analysis (Linear Analysis)
2. Syntax Analysis(Hierarchical Analysis)
3. Semantic Analysis
Synthesis has
1. Intermediate code generation
2. Code optimization
3. Code generation phase
8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 7
Phases of a Compiler

Analysis Phase

Synthesis Phase

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 8


Phases of a Compiler
1. Lexical Analysis (scanning)
The first phase of a compiler is called lexical analysis or
scanning.
Reads the stream of characters making up the source
program and groups them into tokens.

Token are :

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 9


Phases of a Compiler
1. Lexical Analysis (scanning)

The character sequence forming a token is called the


lexeme.

For example ; position


token = identifier
lexeme = position

Some token are associated with values

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 10


• For each lexeme, the lexical analyzer produces as
output a token of the form

– <token-name, attribute-value>
• that it passes on to the subsequent phase, syntax
analysis.
• In the token, the first component token-name is an
abstract symbol that is used during syntax analysis,
and the second component attribute-value points to an
entry in the symbol table for this token.

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 11


• For example, suppose a source program contains the
assignment statement

– position = initial+ rate* 60

• The characters in this assignment could be grouped


into the following lexemes and mapped into the
following tokens passed on to the syntax analyzer:

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 12


8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 13
• Blanks separating the lexemes would be discarded by
the lexical analyzer.

• In this representation, the token names =, +, and *


are abstract symbols for the assignment, addition,
and multiplication operators, respectively.

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 14


Phases of a Compiler
2. Syntax Analysis (parsing)
• The second phase of the compiler is syntax analysis or parsing. The parser
uses the first components of the tokens produced by the lexical analyzer
to create a tree-like intermediate representation that depicts the
grammatical structure of the token stream.

Parse Tree

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 15


Phases of a Compiler
2. Syntax Analysis (parsing)

A syntax tree is a compressed representation of a


parse tree.
Each interior node represents an operation and the
children of the node represent the arguments of the
• The tree has an interior
operation. node labeled * with <id,
Syntax tree for the previous parse tree 3> as its left child and
the integer 60 as its right
child.
• The node <id, 3>
represents the identifier
rate.
8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 16
Phases of a Compiler
3. Semantic Analysis
Uses the syntax tree to check the source program for
semantic errors.
The semantic analyzer uses the syntax tree and the
information in the symbol table to check the source
program for semantic consistency with the language
definition.
An important part of semantic analysis is type checking,
int a[20];
8/23/2022 A[2.4] = 12 Dr.isAnisha
a semantic error
P Rodrigues, NMAMIT, Nitte 17
Phases of a Compiler
3. Semantic Analysis
It also gathers type information and saves it in either the syntax tree
or the symbol table, for subsequent use during intermediate-code generation.
It can also perform type conversions called coercions

Suppose that position, initial , and rate have


been declared to be floating-point numbers and
that the lexeme 60 by itself forms an integer.
The type checker in the semantic analyzer
discovers that the operator * is applied to a
floating-point number rate and an integer 60.
extra node for the operator inttofloat
8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 18
Phases of a Compiler
4 . Intermediate Code Generation

• After syntax and semantic analysis of the source


program, many compilers generate an explicit low-level
or machine-like intermediate representation

• This intermediate representation should have two


important properties:

1. it should be easy to produce


2. it should be easy to translate into the target machine

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 19


Phases of a Compiler
4. Intermediate Code Generation

Intermediate Code
An example of
3 –address code:
which consists of a
sequence of assembly-
like instructions with
three operands per
instruction
8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 20
Phases of a Compiler
5. Code Optimization
• The code-optimization phase attempts to improve the
intermediate code so that better target code will
result.
• Usually better means faster, but other objectives may
be desired, such as shorter code, or target code that
consumes less power.
Optimized code

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 21


Phases of a Compiler
6. Code Generation
• The code generator takes as input an intermediate
representation of the source program and maps it into
the target language.
• Target code may be assembly code or machine code

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 22


Phases of a Compiler
Symbol-Table Management
• An essential function of a compiler is to record the
variable names used in the source program and
collect information about various attributes of each
name.
• A symbol table is a data structure containing a
record for each variable name, with fields for the
attributes of the name.
Attributes may provide information about
• storage allocated for a variable
• Data type
• scope of the variable
8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 23
Phases of a Compiler
Symbol-Table Management

In the case of procedure name atributes may provide


information about
• the number and types of its arguments
• the method of passing each argument (for
example, by value or by reference),
• and the type returned

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 24


Phases of a Compiler
Symbol-Table Management
• When an identifiers in the source program is detected
by lexical analyzer, the identifier is entered in to the
symbol table. However its attributes cannot be
determined during lexical analysis. They are entered
during the later phases.

• Ex:

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 25


Phases of a Compiler
Error detection
• Each phase can encounter errors . If an error is
detected in a phase, it must somehow deal with that
error so that further errors to be detected.
• Errors can occur during:
• Lexical analysis
• Syntax analysis
• Semantic analysis

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 26


Phases of a
Compiler

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 27


Cousins of the compiler

A compiler is assisted by

1. Preprocessors
2. Assemblers
3. linkers/loaders

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 28


Cousins of the compilers
A language-processing system

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 29


Cousins of the compilers
1. Preprocessors
• A preprocessor is a program that processes its input data to
produce output that is used as input to another program.
• The output is said to be a preprocessed form of the input data,
which is often used by some subsequent programs like
compilers.
• Preprocessors produce input to compilers. Its tasks are

1. Macro processing
2. File Inclusions
• E.g. In C #include <global.h>
3. Rational Preprocessors
4. Language extensions

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 30


Cousins of the compilers
Preprocessors
Macro processing:
• A macro is a rule or pattern that specifies how a
certain input sequence should be mapped to an output
sequence according to a defined procedure.
• The mapping process that instantiates a macro into a
specific output sequence is known as macro expansion.

File Inclusion:
• Preprocessor includes header files into the program
text.
• When the preprocessor finds an #include directive it
replaces it by the entire content of the specified file
8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 31
Cousins of the compilers
Preprocessors
Rational Preprocessors
• Augment older languages with more modern flow-of-
control and data-structuring facilities.

Language extensions
• These processors attempt to add capabilities to the
language by what amounts to built-in macros.
• For example, the language Equel(EMBEDDED SQL IN
C) is a database query language embedded in C.

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 32


The Grouping of Phases into Passes
Phases of a compiler are collected into front end and back end

Front end

Back end

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 33


The Grouping of Phases into Passes
In an implementation, activities from several phases
may be grouped together into a pass that reads an
input file and writes an output file

Example

One Pass

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 34


The Grouping of Phases into Passes

optional

Back end pass

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 35


• With these collections, we can produce
compilers for different source languages for
one target machine by combining different
front ends with the back end for that target
machine.

• Similarly, we can produce compilers for


different target machines, by combining a
front end with back ends for different target
machines.

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 36


Compiler-Construction Tools

• Specialized tools have been created to help


implement various phases of a compiler.
• Some commonly used compiler-construction
tools include :

1. Scanner generators
2. Parser generators
3. Syntax directed translation engines
4. Automatic code generators
5. Data flow engines

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 37


Compiler-Construction Tools
1.Scanner generators
• Produce lexical analyzers from a regular-expression
description of the tokens of a language

• The basic organization of the resulting lexical


analyzer is in effect a finite automaton.

E.g. : LEX

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 38


Compiler-Construction Tools
2. Parser generators
• Automatically produce syntax analyzers from a
grammatical description of a programming language

• With these tools , this phase is now one of the


easiest to implement

E.g. : YACC

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 39


Compiler-Construction Tools
3. Syntax directed translation engines
• Produce collections of routines for walking a parse
tree and generating intermediate code.

4. Automatic code generators


• Produce a code generator from a collection of rules
for translating each operation of the intermediate
language into the machine language for a target
machine.
8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 40
Compiler-Construction Tools
5. Data flow engines

• These facilitate the gathering of information about


how values are transmitted from one part of a
program to each other part.

• Data-flow analysis is a key part of code


optimization

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 41


THANK YOU

8/23/2022 Dr. Anisha P Rodrigues, NMAMIT, Nitte 42

You might also like