You are on page 1of 46

Module-

03
• Introduction to Compiler Design
• The Structure of Compiler
• The Science of Building a Compiler
• Bootstrapping and Cross compiler
• The role of the Lexical analyzer
• Input Buffering
• Specification of Tokens
• Recognition of Tokens
• The Lexical Analyzer Generator (LEX/FLEX) 1
Session-01
Introduction to Compiler Design

2
3
• Preprocessing is the first pass of any C compilation. It
processes include-files, conditional compilation
instructions and macros.
• Compilation is the second pass. It takes the output of the
preprocessor, and the source code, and generates
assembler source code.
• Assembly is the third stage of compilation. It takes the
assembly source code and produces an assembly listing
with offsets. The assembler output is stored in an object
file.
• Linking is the final stage of compilation. It takes one or
more object files or libraries as input and combines them
to produce a single (usually executable) file. In doing so, it
resolves references to external symbols, assigns final
addresses to procedures/functions and variables, and
revises code and data to reflect new addresses (a process
called relocation).
4
• Preprocessor :

The C Preprocessor is not part of the


compiler, but is a separate step in the
compilation process. In simplistic terms, a C
Preprocessor is just a text substitution tool
and they instruct compiler to do required
pre-processing before actual compilation. So
basically preprocessor processes :
Include-files
• Conditional compilation instructions
• Macros
5
#include Inserts a particular header from another file
#define Substitutes a preprocessor macro
#undef Undefines a preprocessor macro
#if Tests if a compile time condition is true
#ifdef Returns true if this macro is defined

6
7
So, suppose in the following program:-

#include<stdio.h>
int main()
{
printf("whatever");
return 0;
}
The preprocessor includes the contents of the header file
in the code. The compiler does its work, and then finally
linker combines this object file with another object file
which actually has stored the way printf() works.
8
Session-02, 03
The Structure of Compiler
The Science of Building a Compiler

9
10
11
12
13
Session-04
Bootstrapping and Cross compiler

Bootstrapping
• Bootstrapping is widely used in the compilation development.
• Bootstrapping is used to produce a self-hosting compiler. Self-
hosting compiler is a type of compiler that can compile its own
source code.
• Bootstrap compiler is used to compile the compiler and then you
can use this compiled compiler to compile everything else as well
as future versions of itself.
A compiler can be characterized by three languages:
1.Source Language
2.Target Language
3.Implementation Language
14
The T- diagram shows a compiler SCIT for Source S, Target T,
implemented in I.

15
More Example of Bootstrapping

16
CROSS COMPILER

A cross compiler is a compiler capable of creating


executable code for a platform other than the one on which
the compiler is running. For example, a cross compiler
executes on machine X and produces machine code for
machine Y.

17
18
LEXICAL ANALYSIS
Outline
Role of lexical analyzer
Specification of tokens
Recognition of tokens
Lexical analyzer generator
Finite automata
Design of lexical analyzer generator
The role of lexical analyzer

token
Source Lexical To semantic
program Parser analysis
Analyzer
getNextToken

Symbol
table
Lexemes ,Tokens, and Patterns
Lexeme : A Sequence of input characters that
comprises a single token is called a lexeme.

Tokens: These are the classes of similar lexemes


which are identified by the same tokens.

Pattern: A pattern is a rule which describes a token


For example, identifier has a pattern which describes it
to start with a letter followed by letter or digit
respectively.
Lexical Errors
A character sequence which is not possible to scan into any
valid token is a lexical error. Important facts about the lexical
error:

• Misspelling of identifiers, operators, keyword are


considered as lexical errors
• Generally, a lexical error is caused by the appearance of
some illegal character, mostly at the beginning of a token.

23
Lexical Analysis
Lexical analyzer: reads input characters and produces a
sequence of tokens as output (nexttoken()).
Trying to understand each element in a program.
 Token: a group of characters having a collective meaning.
double pi = 3.14159;

Token 1: (keyword, double)


Token 2: (identifier, ‘pi’)
Token 3: (operator, =)
Token 4: (realnumber, 3.14159)
Token 5: (delimiter,;)

09/14/2023 24
Basic functions of Lexical Analysis:

• Removing white spaces and comments from the source program.

• Identifying tokens in the program such as variables, keywords.

• Identifying constants.

• Creating storage for identifiers in the symbol table.


Input buffering
Sometimes lexical analyzer needs to look ahead some
symbols to decide about the token to return
In C language: we need to look after -, = or < to decide
what token to return
In Fortran: DO n = 1.10
Input buffering
Lexeme Begin Pointer (LB)
Forward Pointer (FW)
Input Buffering
Single Buffer Input Buffering
Two Buffer Input Buffering
Sentinel Symbol (eof)
Algorithm of two buffer pointer along with sentinel
symbol
i n t v o i d m a i n ( ) { }
Updating for Lexeme Begin Pointer after receiving ws or delimiter
Two Buffer Input Buffering
Algorithm for Forward Pointer of two buffer
input Buffering along with Sentinels Symbol
Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Why to separate Lexical analysis and parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Attributes for tokens
In addition to token a lexical analyzer identifies some
additional information as attributes of tokens
x = y

Lexeme <token, token attribute>


x <identifier, <pointer to symbol table>
= <assignment op, ->
y <identifier, <pointer to symbol table>
Example swap(int x, int y)
{ int temp;
temp= x;
x= y;
y = temp;
}

Tokens Lexemes Pattern


keyword int Integer values
identifier swap, x, y, temp Letter followed by letters or digits
Assignment operator = Assigns values
delimiter ( , ), ;, {, }
Specification of tokens
In theory of compilation regular expressions are used
to formalize the specification of tokens
Regular expressions are means for specifying regular
languages
Example:
 Letter(letter | digit)*
Each regular expression is a pattern specifying the
form of strings
Regular definitions
d1 -> r1
d2 -> r2

dn -> rn

 Example:
letter -> A | B | … | Z | a | b | … | Z |
digit -> 0 | 1 | … | 9
id -> letter(letter | digit)*
Identifiers

37
Delimiters

38
Numbers

39
Keywords

40
Relational Operators

41
Recognition of tokens
Starting point is the language grammar to understand
the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt

expr -> term relop term
| term
term -> id
| number
Lexical Analyzer Generator - Lex
Lex Source Lexical lex.yy.c
program Compiler
lex.l

lex.yy.c
C a.out
compiler

Input stream Sequence


a.out
of tokens
Structure of Lex programs

declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Example
%{
Int installID() {/* funtion to install the
/* definitions of manifest constants
lexeme, whose first character is
LT, LE, EQ, NE, GT, GE, pointed to by yytext, and whose
IF, THEN, ELSE, ID, NUMBER, RELOP */ length is yyleng, into the symbol
%} table and return a pointer thereto
*/
/* regular definitions }
delim [ \t\n]
ws {delim}+ Int installNum() { /* similar to
letter [A-Za-z] installID, but puts numerical
constants into a separate table */
digit [0-9]
}
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?

%%
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);}

End of Unit-III

46

You might also like