Professional Documents
Culture Documents
Unit - 2: Lexical Analyzer
Unit - 2: Lexical Analyzer
GTU # 3170701
Unit – 2
Lexical Analyzer
Symbol Table
Upon receiving a “Get next token” command from parser, the lexical analyzer reads the input
character until it can identify the next token.
Lexical analyzer also stripping out comments and white space in the form of blanks, tabs, and
newline characters from the source program.
= Operator1
45 Constant1
Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 8
Input buffering
Input buffering
There are mainly two techniques for input buffering:
1. Buffer pairs
2. Sentinels
Buffer Pair
The lexical analysis scans the input string from left to right one character at a time.
Buffer divided into two N-character halves, where N is the number of character on one disk
block.
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :
forward forward
lexeme_beginnig
forward
lexeme_beginnig
In buffer pairs we must check, each time we move the forward pointer that we have not moved
off one of the buffers.
Thus, for each character read, we make two tests.
We can combine the buffer-end test with the test for the current character.
We can reduce the two tests to one if we extend each buffer to hold a sentinel character at the
end.
The sentinel is a special character that cannot be part of the source program, and a natural
choice is the character EOF.
*
𝜖
a
aa
aaa Infinite …..
aaaa
aaaaa…..
+
L = One or More Occurrences of a = a+
a
aa
aaa Infinite …..
aaaa
aaaaa…..
𝒘𝒉𝒆𝒓𝒆 𝑳𝒊𝒔𝑳𝒆𝒕𝒕𝒆𝒓∧𝐃𝐢𝐬𝐝𝐢𝐠𝐢𝐭
……
is a state
is a transition
is a start state
is a final state
<
0 1
=
2 return (relop,LE)
>
3 return (relop,NE)
=
other
5
4 return (relop,LT)
return (relop,EQ)
>
6 =
7 return (relop,GE)
other
8 return (relop,GT)
E digit
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 36
Hard coding & automatic
generation Lexical analyzers
Hard coding and automatic generation lexical analyzers
Lexical analysis is about identifying the pattern from the input.
To recognize the pattern, transition diagram is constructed.
It is known as hard coding lexical analyzer.
Example: to represent identifier in ‘C’, the first character must be letter and other characters are
either letter or digits.
To recognize this pattern, hard coding lexical analyzer will work with a transition diagram.
The automatic generation lexical analyzer takes special notation as input.
For example, lex compiler tool will take regular expression as input and finds out the pattern
matching to that regular expression.
Letter or digit
Start Letter
1 2 3
𝑖 𝑓
start
start N(t)
𝑖 𝑓
𝜖 N(s)
𝑖 𝑓
start a a b
1 2 3
𝑖
𝜖 𝜖
𝑓
start
𝑖 𝑓
start N(s)
𝜖 N(t) 𝜖 𝜖
1
𝜖
6
1 2 3 4
𝜖 𝜖 𝜖
4 5
b
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 44
Regular expression to NFA using Thompson's rule
a*b
𝜖 𝑎 𝜖 𝑏
1 2 3 4 5
𝜖
b*ab
𝜖
𝜖 𝑏 𝜖 𝑎 𝑏
1 2 3 4 5 6
𝜖
OPERATION DESCRIPTION
Set of NFA states reachable from NFA state on –
transition alone.
Set of NFA states reachable from some NFA state
in on – transition alone.
Set of NFA states to which there is a transition on
input symbol from some NFA state in .
if is not in then
add as unmarked state to
end
end
(a|b)*abb 𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
𝜖- Closure(0)= {0, 1, 7, 2, 4}
= {0,1,2,4,7} ---- A
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
𝜖 𝜖
4 5
b
𝜖
A= {0, 1, 2, 4, 7}
Move(A,a) = {3,8}
𝜖- Closure(Move(A,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 52
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b
A= {0, 1, 2, 4, 7}
Move(A,b) = {5}
𝜖- Closure(Move(A,b)) = {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 53
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b
𝜖
B = {1, 2, 3, 4, 6, 7, 8}
Move(B,a) = {3,8}
𝜖- Closure(Move(B,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 54
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b
B= {1, 2, 3, 4, 6, 7, 8}
Move(B,b) = {5,9}
𝜖- Closure(Move(B,b)) = {5, 6, 7, 1, 2, 4, 9}
= {1,2,4,5,6,7,9} ---- D
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 55
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b
C= {1, 2, 4, 5, 6 ,7}
Move(C,a) = {3,8}
𝜖- Closure(Move(C,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 56
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b
𝜖
C= {1, 2, 4, 5, 6, 7}
Move(C,b) = {5}
𝜖- Closure(Move(C,b))= {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 57
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B
b
D= {1, 2, 4, 5, 6, 7, 9}
Move(D,a) = {3,8}
𝜖- Closure(Move(D,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 58
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10}
𝜖
D= {1, 2, 4, 5, 6, 7, 9}
Move(D,b) = {5,10}
𝜖- Closure(Move(D,b)) = {5, 6, 7, 1, 2, 4, 10}
= {1,2,4,5,6,7,10} ---- E
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 59
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B
𝜖
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,a) = {3,8}
𝜖- Closure(Move(E,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 60
Conversion from NFA to DFA
a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B C
𝜖
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,b)= {5}
𝜖- Closure(Move(E,b))= {5,6,7,1,2,4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 61
Conversion from NFA to DFA
b
States
A = {0,1,2,4,7}
a
B
b
C
a B a
D
B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
B
B
D
C
A a a b
D = {1,2,4,5,6,7,9}
E = {1,2,4,5,6,7,10}
B
B
E
C
b
C b
E
Transition Table
b
Note:
• Accepting state in NFA is 10 DFA
• 10 is element of E
• So, E is acceptance state in DFA
States a b
{𝐴,𝐵,𝐶 , 𝐷, 𝐸}
A B C
B B D
Accepting States
Nonaccepting States C B C
D B E
{𝐷} E B C
States a b
A B A
B B D
Now no more splitting is possible. D B E
E B A
If we chose A as the representative for group
Optimized
(AC), then we obtain reduced transition table Transition Table
firstpos(n)
The set of positions that can match the first symbol of a string generated by the subtree at node
lastpos(n)
The set of positions that can match the last symbol of a string generated be the subtree at node
followpos(i)
The set of positions that can follow position in the tree.
2. If n is * node and i is position in lastpos(n), then all position in firstpos(n) are in followpos(i)
n nullable(c1)
𝑎 𝑏 . and
𝟏 𝟐 nullable(c2)
False False Here, * is only nullable node c1 c2
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 72
Conversion from regular expression to DFA
{1,2} ¿ c1
n if (nullable(c1))
.
𝑎 𝑏 thenfirstpos(c1) firstpos(c2)
{1} 𝟏 {2} 𝟐 c1 else firstpos(c1)
c2
{1,2,3 } . {4} 𝟔
{5} 𝑏{5}
{1,2,3 } . {3} {4} 𝑏{4} 𝟓
n
¿ lastpos(c1) lastpos(c2)
c1 c2
𝟒
{1,2} ∗ {1,2} {3} 𝑎{3} n ∗
𝟑 lastpos(c1)
{1,2} ¿{1,2} n
c1
if (nullable(c2)) then
.
𝑎 𝑏 lastpos(c1) lastpos(c2)
{1} 𝟏{1} {2} 𝟐{2} c1 c2 else lastpos(c2)
=(1,2,3) ----- A
States a b
A={1,2,3} B A
B={1,2,3,4}
State C
δ( (1,2,3,5),a) = followpos(1) U followpos(3) States a b
A={1,2,3} B A
=(1,2,3) U (4) = {1,2,3,4} ----- B
B={1,2,3,4} B C
C={1,2,3,5} B D
δ( (1,2,3,5),b) = followpos(2) U followpos(5) D={1,2,3,6}
b
a States a b
A={1,2,3} B A
a b b B={1,2,3,4} B C
A B C D
C={1,2,3,5} B D
a
a D={1,2,3,6} B A
b
DFA
Prof. Dixita B Kagathara #3170701 (CD) Unit 2 – Lexical Analysis 82
An Elementary Scanner Design &
It’s Implementation
An Elementary Scanner Design & It’s Implementation
Tasks of Scanner
1. The main purpose of the scanner is to return the next input token to the parser.
2. The scanner must identify the complete token and sometimes differentiate between keywords
and identifiers.
3. The scanner may perform symbol-table maintenance, inserting identifiers, literals, and constants
into the tables.
4. The scanner also eliminate the white spaces.
Regular Expression: Tokens can be Specified using regular expression.
Example: id letter(letter | digit)*
1
. 2
digit
3
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701 (PS)
(CD) Unit
Unit12––Basic
Lexical
Probability
Analysis 84
Implementation of Lexical Analyzer (Lex)
Lex is tool or a language which is useful for generating a lexical Analyzer and it specifies the
regular expression
Regular expression is used to represent the patterns for a token.
Creating Lexical Analyzer with LEX
Source
Lex Compiler lex.yy.c
Program
(lex.l)
Input Sequence
a.out
Stream of token
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 2– –Basic
Lexical
Probability
Analysis 85
Structure of Lex Program
Any lex program contains mainly three sections
1. Declaration
2. Translation rules
3. Auxiliary Procedures
Structure of Program
Declaration It is used to declare variables, constant & regular definition
Syntax: Pattern {Action}
Example:
Example:
%{
%% %%
int x,y;
Translation rule pattern1 {Action1}
float rate;
%% %}
pattern2 {Action2}
pattern3 {Action3}
Digit [0-9]
%%
Auxiliary Procedures Letter [A-ZAll a-Z]
the function needed are specified over here.
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701 (PS)
(CD) Unit
Unit12––Basic
Lexical
Probability
Analysis 86
Example: Lex Program
Program: Write Lex program to recognize identifier, keywords, relational operator and numbers
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD) Unit
Unit1 2– –Basic
Lexical
Probability
Analysis 88
Thank You