Professional Documents
Culture Documents
COMPILER CONSTRUCTION(CS-462)
(LECTURE 7)
(RELATED TO ASS. # 1)
Source Code
Processing of Language
#include, #defines Preprocessor Trivial errors
#ifdef, etc
Preprocessed source code (foo.i)
Lexical Analysis
Syntax Analysis Errors
Semantic Analysis
Abstract
Syntax Tree
Lexical Analysis Process 5
if (b == 0) a = b; Preprocessed source
code, read char by char
if ( b == 0 ) a = b ;
Lexical analysis
- Transform multi-character input stream to token stream
- Reduce length of program representation (remove spaces)
Tokens, Patterns, Lexemes 6
if if if
Source File
.
.
int num1 = a+(b/c1);
.
.
.
i n t n u m 1 = a + ( b / c 1 ) ;
Lexical Errors
What are the possible words
recognized by a Lexical Analyzer?
Keywords
Identifiers
Operators
Numeric constants
Character constants
Punctuations / Special Symbols
Working of Lexical
Analyzer
We must understand how Lexical Analyzer
recognize different tokens in a source code.
We use the help of RE.
RE alone cannot handle some ambiguities that
can occur during scanning process.
Ambiguities in Lexical
Analysis
There can occur 2 types of ambiguities while
developing Lexical analyzer.
Keywords can also be Identifiers.
How much part of the string should be token.
Language Definition should provide
‘DISAMBIGUATING RULES’ to solve these problems.
Disambiguating Rules
Priority Rule
Longest Sub string Principle.
Priority Rule
int ab = 67.1;
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
i n t a b = 6 7 . 1 ;
lexeme_s forward
tart
INT
DRIVER
… DFA / T.T
Symbol Table
Principle of Longest
Substring
Also called ‘Principle of Maximal Munch’.