Professional Documents
Culture Documents
03
• Introduction to Compiler Design
• The Structure of Compiler
• The Science of Building a Compiler
• Bootstrapping and Cross compiler
• The role of the Lexical analyzer
• Input Buffering
• Specification of Tokens
• Recognition of Tokens
• The Lexical Analyzer Generator (LEX/FLEX) 1
Session-01
Introduction to Compiler Design
2
3
• Preprocessing is the first pass of any C compilation. It
processes include-files, conditional compilation
instructions and macros.
• Compilation is the second pass. It takes the output of the
preprocessor, and the source code, and generates
assembler source code.
• Assembly is the third stage of compilation. It takes the
assembly source code and produces an assembly listing
with offsets. The assembler output is stored in an object
file.
• Linking is the final stage of compilation. It takes one or
more object files or libraries as input and combines them
to produce a single (usually executable) file. In doing so, it
resolves references to external symbols, assigns final
addresses to procedures/functions and variables, and
revises code and data to reflect new addresses (a process
called relocation).
4
• Preprocessor :
6
7
So, suppose in the following program:-
#include<stdio.h>
int main()
{
printf("whatever");
return 0;
}
The preprocessor includes the contents of the header file
in the code. The compiler does its work, and then finally
linker combines this object file with another object file
which actually has stored the way printf() works.
8
Session-02, 03
The Structure of Compiler
The Science of Building a Compiler
9
10
11
12
13
Session-04
Bootstrapping and Cross compiler
Bootstrapping
• Bootstrapping is widely used in the compilation development.
• Bootstrapping is used to produce a self-hosting compiler. Self-
hosting compiler is a type of compiler that can compile its own
source code.
• Bootstrap compiler is used to compile the compiler and then you
can use this compiled compiler to compile everything else as well
as future versions of itself.
A compiler can be characterized by three languages:
1.Source Language
2.Target Language
3.Implementation Language
14
The T- diagram shows a compiler SCIT for Source S, Target T,
implemented in I.
15
More Example of Bootstrapping
16
CROSS COMPILER
17
18
LEXICAL ANALYSIS
Outline
Role of lexical analyzer
Specification of tokens
Recognition of tokens
Lexical analyzer generator
Finite automata
Design of lexical analyzer generator
The role of lexical analyzer
token
Source Lexical To semantic
program Parser analysis
Analyzer
getNextToken
Symbol
table
Lexemes ,Tokens, and Patterns
Lexeme : A Sequence of input characters that
comprises a single token is called a lexeme.
23
Lexical Analysis
Lexical analyzer: reads input characters and produces a
sequence of tokens as output (nexttoken()).
Trying to understand each element in a program.
Token: a group of characters having a collective meaning.
double pi = 3.14159;
09/14/2023 24
Basic functions of Lexical Analysis:
• Identifying constants.
Example:
letter -> A | B | … | Z | a | b | … | Z |
digit -> 0 | 1 | … | 9
id -> letter(letter | digit)*
Identifiers
37
Delimiters
38
Numbers
39
Keywords
40
Relational Operators
41
Recognition of tokens
Starting point is the language grammar to understand
the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
Lexical Analyzer Generator - Lex
Lex Source Lexical lex.yy.c
program Compiler
lex.l
lex.yy.c
C a.out
compiler
declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Example
%{
Int installID() {/* funtion to install the
/* definitions of manifest constants
lexeme, whose first character is
LT, LE, EQ, NE, GT, GE, pointed to by yytext, and whose
IF, THEN, ELSE, ID, NUMBER, RELOP */ length is yyleng, into the symbol
%} table and return a pointer thereto
*/
/* regular definitions }
delim [ \t\n]
ws {delim}+ Int installNum() { /* similar to
letter [A-Za-z] installID, but puts numerical
constants into a separate table */
digit [0-9]
}
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?
%%
{ws} {/* no action and no return */}
if {return(IF);}
then {return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);}
…
End of Unit-III
46