You are on page 1of 89

Compiler Design (CD)

GTU # 3170701

Unit – 2
Lexical Analyzer

Prof. Dixita B. Kagathara


Computer Engineering Department
Darshan Institute of Engineering & Technology, Rajkot
dixita.kagathara@darshan.ac.in
+91 - 97277 47317 (CE Department)
Topics to be covered
 Looping
• Interaction of scanner & parser
• Token, Pattern & Lexemes
• Input buffering
• Specification of tokens
• Regular expression & Regular definition
• Transition diagram
• Hard coding & automatic generation lexical analyzers
• Finite automata
• Regular expression to NFA using Thompson's rule
• Conversion from NFA to DFA using subset construction method
• DFA optimization
• Conversion from regular expression to DFA
• An Elementary Scanner Design & It’s Implementation
Interaction with Scanner & Parser
Interaction of scanner & parser
Token
Source Lexical
Parser
Program Analyzer
Get next token

Symbol Table

 Upon receiving a “Get next token” command from parser, the lexical analyzer reads the input
character until it can identify the next token.
 Lexical analyzer also stripping out comments and white space in the form of blanks, tabs, and
newline characters from the source program.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 4


Why to separate lexical analysis & parsing?
1. Simplicity in design.
2. Improves compiler efficiency.
3. Enhance compiler portability.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 5


Token, Pattern & Lexemes
Token, Pattern & Lexemes
Token Pattern
Sequence of character having a collective The set of rules called pattern associated
meaning is known as token. with a token.
Categories of Tokens: Example: “non-empty sequence of digits”,
“letter followed by letters and digits”
1. Identifier
2. Keyword
Lexemes
3. Operator
The sequence of character in a source
4. Special symbol
program matched with a pattern for a token
5. Constant is called lexeme.
Example: Rate, DIET, count, Flag

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 7


Example: Token, Pattern & Lexemes
Example: total = sum + 45
Tokens:
total Identifier1

= Operator1

sum Identifier2 Tokens


+ Operator2

45 Constant1

Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 8
Input buffering
Input buffering
 There are mainly two techniques for input buffering:
1. Buffer pairs
2. Sentinels

Buffer Pair
 The lexical analysis scans the input string from left to right one character at a time.
 Buffer divided into two N-character halves, where N is the number of character on one disk
block.

: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 10


Buffer pairs

: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

forward forward
lexeme_beginnig

 Pointer Lexeme Begin, marks the beginning of the current lexeme.


 Pointer Forward, scans ahead until a pattern match is found.
 Once the next lexeme is determined, forward is set to character at its right end.
 Lexeme Begin is set to the character immediately after the lexeme just found.
 If forward pointer is at the end of first buffer half then second is filled with N input character.
 If forward pointer is at the end of second buffer half then first is filled with N input character.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 11


Buffer pairs
: : : E : : = : : Mi : * : : : C: * : * : 2 : eof : : :

forward forward forward


lexeme_beginnig
Code to advance forward pointer
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at end of second half then begin
reload first half;
move forward to beginning of first half;
end
else forward := forward + 1;
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 12
Sentinels

: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof

forward
lexeme_beginnig

 In buffer pairs we must check, each time we move the forward pointer that we have not moved
off one of the buffers.
 Thus, for each character read, we make two tests.
 We can combine the buffer-end test with the test for the current character.
 We can reduce the two tests to one if we extend each buffer to hold a sentinel character at the
end.
 The sentinel is a special character that cannot be part of the source program, and a natural
choice is the character EOF.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 13


Sentinels
: : E : : = : : Mi : * : eof : C: * : * : 2 : eof : : eof

forward forward forward


lexeme_beginnig
forward := forward + 1;
if forward = eof then begin
if forward at end of first half then begin
reload second half;
forward := forward + 1;
end
else if forward at the second half then begin
reload first half;
move forward to beginning of first half;
end
else terminate lexical analysis;
end
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 14
Specification of tokens
Strings and languages
Term Definition
Prefix of s A string obtained by removing zero or more trailing symbol of
string S.
e.g., ban is prefix of banana.
Suffix of S A string obtained by removing zero or more leading symbol of
string S.
e.g., nana is suffix of banana.
Sub string of S A string obtained by removing prefix and suffix from S.
e.g., nan is substring of banana
Proper prefix, suffix Any nonempty string x that is respectively proper prefix, suffix or
and substring of S substring of S, such that s≠x.
Subsequence of S A string obtained by removing zero or more not necessarily
contiguous symbol from S.
e.g., baaa is subsequence of banana.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 16


Exercise
 Write prefix, suffix, substring, proper prefix, proper suffix and subsequence of following string:
String: Compiler

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 17


Operations on languages
Operation Definition
Union of L and M
Written L U M
Concatenation of L
and M
Written LM
Kleene closure of L
Written L∗
Positive closure of L
Written L+

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 18


Regular Expression & Regular
Definition
Regular expression
 A regular expression is a sequence of characters that define a pattern.
Notational shorthand's
1. One or more instances: +
2. Zero or more instances: *
3. Zero or one instances: ?
4. Alphabets: Σ

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 20


Rules to define regular expression
1. is a regular expression that denotes , the set containing empty string.
2. If is a symbol in then is a regular expression,
3. Suppose and are regular expression denoting the languages and . Then,
a. is a regular expression denoting
b. is a regular expression denoting
c. * is a regular expression denoting
d. is a regular expression denoting

The language denoted by regular expression is said to be a regular set.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 21


Regular expression
 L = Zero or More Occurrences of a = a*

*
𝜖
a
aa
aaa Infinite …..
aaaa
aaaaa…..

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 22


Regular expression

+
 L = One or More Occurrences of a = a+

a
aa
aaa Infinite …..
aaaa
aaaaa…..

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 23


Precedence and associativity of operators

Operator Precedence Associative


Kleene * 1 left
Concatenation 2 left
Union | 3 left

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 24


Regular expression examples
1. 0 or 1
𝐒𝐭𝐫𝐢𝐧𝐠𝐬 :𝟎 , 𝟏 𝐑 .𝐄 .=𝟎∨𝟏
2. 0 or 11 or 111
𝐒𝐭𝐫𝐢𝐧𝐠𝐬:𝟎,𝟏𝟏,𝟏𝟏𝟏 𝐑 .𝐄.=𝟎|𝟏𝟏|𝟏𝟏𝟏
3. String having zero or more a.
𝐒𝐭𝐫𝐢𝐧𝐠𝐬:𝛜,𝐚,𝐚𝐚,𝐚𝐚𝐚,𝐚𝐚𝐚𝐚…..𝐑 .𝐄 .=𝐚 ∗
4. String having one or more a.
𝐒𝐭𝐫𝐢𝐧𝐠𝐬:𝐚,𝐚𝐚,𝐚𝐚𝐚 ,𝐚𝐚𝐚𝐚…..𝐑 .𝐄 .=𝐚 +¿
5. Regular expression over that represent all string of length 3.
𝐒𝐭𝐫𝐢𝐧𝐠𝐬:𝐚𝐛𝐜,𝐛𝐜𝐚,𝐛𝐛𝐛,𝐜𝐚𝐛,𝐚𝐛𝐚…. 𝐑.𝐄.=( 𝐚|𝐛|𝐜)( 𝐚|𝐛|𝐜) (𝐚|𝐛|𝐜)
6. All binary string
𝐒𝐭𝐫𝐢𝐧𝐠𝐬:𝟎,𝟏𝟏,𝟏𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,𝟏𝟏𝟏𝟏… +

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 25


Regular expression examples
7. 0 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝜖,𝒂,𝒂𝒂,𝒂𝒃𝒂𝒃,𝒃𝒂𝒃… 𝑹. 𝑬 .=(𝒂∨𝒃)∗
8. 1 or more occurrence of either a or b or both
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂,𝒂𝒂,𝒂𝒃𝒂𝒃,𝒃𝒂𝒃,𝒃𝒃𝒃𝒂𝒂𝒂… +

9. Binary no. ends with 0


𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎,𝟏𝟎,𝟏𝟎𝟎,𝟏𝟎𝟏𝟎,𝟏𝟏𝟏𝟏𝟎… *

10. Binary no. ends with 1


𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟏,𝟏𝟎𝟏,𝟏𝟎𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,… 𝑹. 𝑬 .=(𝟎∨𝟏)∗ 𝟏
11. Binary no. starts and ends with 1
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟏𝟏,𝟏𝟎𝟏,𝟏𝟎𝟎𝟏,𝟏𝟎𝟏𝟎𝟏,…
12. String starts and ends with same character
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎,𝟏𝟎𝟏,𝒂𝒃𝒂,𝒃𝒂𝒂𝒃…

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 26


Regular expression examples
13. All string of a and b starting with a
… *

14. String of 0 and 1 ends with 00


… 𝑹. 𝑬 .=(𝟎∨𝟏)∗𝟎𝟎
15. String ends with abb
… 𝑹. 𝑬 .=(𝒂∨𝒃)∗ 𝒂𝒃𝒃
16. String starts with 1 and ends with 0
… 𝑹. 𝑬 .=𝟏(𝟎∨𝟏)∗𝟎
17. All binary string with at least 3 characters and 3rd character should be zero
… 𝑹.𝑬.= ( 𝟎|𝟏 )( 𝟎|𝟏) 𝟎(𝟎∨𝟏)∗
18. Language which consist of exactly two b’s over the set
… 𝑹. 𝑬 .=𝒂∗𝒃 𝒂∗𝒃𝒂∗
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 27
Regular expression examples
19. The language with such that 3rd character from right end of the string is always a.
… 𝑹.𝑬.=(𝒂∨𝒃)∗𝒂(𝒂∨𝒃)(𝒂∨𝒃)
20. Any no. of followed by any no. of followed by any no. of
… 𝑹. 𝑬 .=𝒂∗𝒃∗𝒄∗
21. String should contain at least three ∗ ∗ ∗ ∗
…. 𝑹.𝑬.=(𝟎∨𝟏) 𝟏(𝟎∨𝟏) 𝟏(𝟎∨𝟏) 𝟏(𝟎∨𝟏)
22. String should contain exactly two ∗ ∗ ∗
…. 𝑹 . 𝑬 .=𝟎 𝟏 𝟎 𝟏 𝟎
23. ….
𝑹. 𝑬.= ( 𝟎∨𝟏 )|( 𝟎∨𝟏 )( 𝟎∨𝟏)|( 𝟎∨𝟏 )( 𝟎∨𝟏 )( 𝟎∨𝟏)
Length of string should be at least 1 and at most 3

24. No. of zero should be multiple of 3 ∗ ∗ ∗ ∗ ∗


…. 𝑹. 𝑬 .=(𝟏 𝟎𝟏 𝟎𝟏 𝟎𝟏 )
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 28
Regular expression examples
25. The language with where should be multiple of 3
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂𝒂𝒂,𝒃𝒂𝒂𝒂,𝒃𝒂𝒄𝒂𝒃𝒂,𝒂𝒂𝒂𝒂𝒂𝒂.. ∗ ∗ ∗
𝑹.𝑬.=(( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) 𝒂 ( 𝒃∨𝒄 ) )
∗∗

26. Even no. of 0


∗ ∗ ∗ ∗
…. 𝑹 . 𝑬 .=(𝟏 𝟎 𝟏 𝟎𝟏 )
27. String should have odd length ∗
…. 𝑹. 𝑬 .= ( 𝟎∨𝟏 ) (( 𝟎|𝟏 ) (𝟎∨𝟏))
28. String should have even length ∗
…. 𝑹 . 𝑬 .=( ( 𝟎|𝟏 ) (𝟎∨𝟏))
29. String start with 0 and has odd length ∗
…. 𝑹. 𝑬 .= ( 𝟎 ) ( ( 𝟎|𝟏 ) (𝟎∨𝟏))
30. String start with 1 and has even length ∗
…. 𝑹. 𝑬 .=𝟏(𝟎∨𝟏)(( 𝟎|𝟏 ) (𝟎∨𝟏))
31. All string begins or ends with 00 or 11
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎𝟏𝟎𝟏,𝟏𝟎𝟏𝟎𝟎,𝟏𝟏𝟎,𝟎𝟏𝟎𝟏𝟏… 𝑹.𝑬.=(𝟎𝟎∨𝟏𝟏)(𝟎∨𝟏)∗∨ (𝟎|𝟏) ∗(𝟎𝟎∨𝟏𝟏)
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 29
Regular expression examples
32. Language of all string containing both 11 and 00 as substring
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟎𝟏𝟏,𝟏𝟏𝟎𝟎,𝟏𝟎𝟎𝟏𝟏𝟎,𝟎𝟏𝟎𝟎𝟏𝟏…
33. String ending with 1 and not contain 00
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝟎𝟏𝟏,𝟏𝟏𝟎𝟏,𝟏𝟎𝟏𝟏…. 𝑹 . 𝑬 .= ( 𝟏|𝟎𝟏 ) +¿
34. Language of identifier
𝑺𝒕𝒓𝒊𝒏𝒈𝒔:𝒂𝒓𝒆𝒂,𝒊,𝒓𝒆𝒅𝒊𝒐𝒖𝒔,𝒈𝒓𝒂𝒅𝒆𝟏…. 𝑹. 𝑬 .=(¿+ 𝑳)(¿+ 𝑳+𝑫) ∗

𝒘𝒉𝒆𝒓𝒆 𝑳𝒊𝒔𝑳𝒆𝒕𝒕𝒆𝒓∧𝐃𝐢𝐬𝐝𝐢𝐠𝐢𝐭

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 30


Regular definition
 A regular definition gives names to certain regular expressions and uses those names in other
regular expressions.
 Regular definition is a sequence of definitions of the form:

……

Where is a distinct name & is a regular expression.


• Example: Regular definition for identifier
letter  A|B|C|………..|Z|a|b|………..|z
digit  0|1|…….|9|
id letter (letter | digit)*

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 31


Regular definition example
 Example: Unsigned Pascal numbers
3
5280
39.37
6.336E4
1.894E-4
2.56E+7
Regular Definition
digit  0|1|…..|9
digits  digit digit*
optional_fraction  .digits | 𝜖
optional_exponent  (E(+|-|𝜖)digits)|𝜖
num  digits optional_fraction optional_exponent

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 32


Transition Diagram
Transition Diagram
 A stylized flowchart is called transition diagram.

is a state

is a transition

is a start state

is a final state

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 34


Transition Diagram : Relational operator

<
0 1
=
2 return (relop,LE)

>
3 return (relop,NE)
=
other
5
4 return (relop,LT)
return (relop,EQ)
>

6 =
7 return (relop,GE)

other
8 return (relop,GT)

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 35


Transition diagram : Unsigned number

digit digit digit

start digit . digit other


1 2
digit +or -
3 4 5 6 7
E
8

E digit
3
5280
39.37
1.894 E - 4
2.56 E + 7
45 E + 6
96 E 2
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 36
Hard coding & automatic
generation Lexical analyzers
Hard coding and automatic generation lexical analyzers
 Lexical analysis is about identifying the pattern from the input.
 To recognize the pattern, transition diagram is constructed.
 It is known as hard coding lexical analyzer.
 Example: to represent identifier in ‘C’, the first character must be letter and other characters are
either letter or digits.
 To recognize this pattern, hard coding lexical analyzer will work with a transition diagram.
 The automatic generation lexical analyzer takes special notation as input.
 For example, lex compiler tool will take regular expression as input and finds out the pattern
matching to that regular expression.

Letter or digit

Start Letter
1 2 3

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 38


Finite Automata
Finite Automata
 Finite Automata are recognizers.
 FA simply say “Yes” or “No” about each possible input string.
 Finite Automata is a mathematical model consist of:
1. Set of states
2. Set of input symbol
3. A transition function move
4. Initial state
5. Final states or accepting states

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 40


Types of finite automata
 Types of finite automata are:
DFA
b
 Deterministic finite automata (DFA): have for
each state exactly one edge leaving out for a b b
1 2 3 4
each symbol.
a
a
b a
NFA DFA
 Nondeterministic finite automata (NFA): a
There are no restrictions on the edges
leaving a state. There can be several with the a b b
1 2 3 4
same symbol as label and some edges can
be labeled with .
b NFA
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 41
Regular expression to NFA using
Thompson's rule
Regular expression to NFA using Thompson's rule
1. For , construct the NFA 3. For regular expression

𝑖 𝑓
start
start N(t)
𝑖 𝑓
𝜖 N(s)

2. For in , construct the NFA Ex: ab

𝑖 𝑓
start a a b
1 2 3

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 43


Regular expression to NFA using Thompson's rule
4. For regular expression 5. For regular expression *
𝜖
N(s) 𝜖
𝜖

𝑖
𝜖 𝜖
𝑓
start
𝑖 𝑓
start N(s)

𝜖 N(t) 𝜖 𝜖

Ex: (a|b) Ex: a*


𝜖
a
2 3
𝜖 𝜖 𝑎 𝜖

1
𝜖

6
1 2 3 4
𝜖 𝜖 𝜖
4 5
b
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 44
Regular expression to NFA using Thompson's rule
 a*b

𝜖 𝑎 𝜖 𝑏
1 2 3 4 5
𝜖

 b*ab
𝜖

𝜖 𝑏 𝜖 𝑎 𝑏
1 2 3 4 5 6
𝜖

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 45


Exercise
Convert following regular expression to NFA:
1. abba
2. bb(a)*
3. (a|b)*
4. a* | b*
5. a(a)*ab
6. aa*+ bb*
7. (a+b)*abb
8. 10(0+1)*1
9. (a+b)*a(a+b)
10. (0+1)*010(0+1)*
11. (010+00)*(10)*
12. 100(1)*00(0+1)*
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 46
Conversion from NFA to DFA
using subset construction
method
Subset construction algorithm
Input: An NFA .
Output: A DFA D accepting the same language.
Method: Algorithm construct a transition table for D. We use the following operation:

OPERATION DESCRIPTION
Set of NFA states reachable from NFA state on –
transition alone.
Set of NFA states reachable from some NFA state
in on – transition alone.
Set of NFA states to which there is a transition on
input symbol from some NFA state in .

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 48


Subset construction algorithm
initially be the only state in and it is unmarked;
while there is unmarked states T in do begin
mark ;
for each input symbol do begin

if is not in then
add as unmarked state to

end
end

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 49


Conversion from NFA to DFA

(a|b)*abb 𝜖

a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10

𝜖 𝜖
4 5
b

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 50


Conversion from NFA to DFA

a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10

𝜖 𝜖
4 5
b

𝜖- Closure(0)= {0, 1, 7, 2, 4}

= {0,1,2,4,7} ---- A

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 51


Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}

𝜖 𝜖
4 5
b

𝜖
A= {0, 1, 2, 4, 7}
Move(A,a) = {3,8}
𝜖- Closure(Move(A,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 52
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b

A= {0, 1, 2, 4, 7}
Move(A,b) = {5}
𝜖- Closure(Move(A,b)) = {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 53
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5
b

𝜖
B = {1, 2, 3, 4, 6, 7, 8}
Move(B,a) = {3,8}
𝜖- Closure(Move(B,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 54
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b

B= {1, 2, 3, 4, 6, 7, 8}
Move(B,b) = {5,9}
𝜖- Closure(Move(B,b)) = {5, 6, 7, 1, 2, 4, 9}
= {1,2,4,5,6,7,9} ---- D
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 55
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b

C= {1, 2, 4, 5, 6 ,7}
Move(C,a) = {3,8}
𝜖- Closure(Move(C,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 56
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9}
b

𝜖
C= {1, 2, 4, 5, 6, 7}
Move(C,b) = {5}
𝜖- Closure(Move(C,b))= {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 57
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B
b

D= {1, 2, 4, 5, 6, 7, 9}
Move(D,a) = {3,8}
𝜖- Closure(Move(D,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 58
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10}
𝜖

D= {1, 2, 4, 5, 6, 7, 9}
Move(D,b) = {5,10}
𝜖- Closure(Move(D,b)) = {5, 6, 7, 1, 2, 4, 10}
= {1,2,4,5,6,7,10} ---- E
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 59
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B
𝜖
E= {1, 2, 4, 5, 6, 7, 10}
Move(E,a) = {3,8}
𝜖- Closure(Move(E,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 60
Conversion from NFA to DFA

a
2 3 States a b
𝜖 𝜖
A = {0,1,2,4,7} B C
𝜖 𝜖 a b b
0 1 6 7 8 9 10 B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
𝜖 𝜖
4 5 D = {1,2,4,5,6,7,9} B E
b
E = {1,2,4,5,6,7,10} B C
𝜖

E= {1, 2, 4, 5, 6, 7, 10}
Move(E,b)= {5}
𝜖- Closure(Move(E,b))= {5,6,7,1,2,4}
= {1,2,4,5,6,7} ---- C
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 61
Conversion from NFA to DFA

b
States
A = {0,1,2,4,7}
a
B
b
C
a B a
D
B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
B
B
D
C
A a a b

D = {1,2,4,5,6,7,9}
E = {1,2,4,5,6,7,10}
B
B
E
C
b
C b
E
Transition Table
b
Note:
• Accepting state in NFA is 10 DFA
• 10 is element of E
• So, E is acceptance state in DFA

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 62


Exercise
 Convert following regular expression to DFA using subset construction method:
1. (a+b)*a(a+b)
2. (a+b)*ab*a

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 63


DFA optimization
DFA optimization
1. Construct an initial partition of the set of states with two groups: the accepting states and
the non-accepting states .
2. Apply the repartition procedure to to construct a new partition .
3. If , let and continue with step (4). Otherwise, repeat step (2) with .
for each group of do begin
partition into subgroups such that two states and
of are in the same subgroup if and only if for all
input symbols , states and have transitions on
to states in the same group of .
replace in by the set of all subgroups formed.
end

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 65


DFA optimization
4. Choose one state in each group of the partition as the representative for that group. The
representatives will be the states of . Let s be a representative state, and suppose on input a
there is a transition of from to . Let be the representative of s group. Then has a transition
from to on . Let the start state of be the representative of the group containing start state
of , and let the accepting states of be the representatives that are in .
5. If has a dead state , then remove from . Also remove any state not reachable from the start
state.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 66


DFA optimization

States a b
{𝐴,𝐵,𝐶 , 𝐷, 𝐸}
A B C
B B D
Accepting States
Nonaccepting States C B C
D B E

{𝐷} E B C

States a b
A B A
B B D
 Now no more splitting is possible. D B E
E B A
 If we chose A as the representative for group
Optimized
(AC), then we obtain reduced transition table Transition Table

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 67


Conversion from regular
expression to DFA
Rules to compute nullable, firstpos, lastpos
 nullable(n)
 The subtree at node generates languages including the empty string.

 firstpos(n)
 The set of positions that can match the first symbol of a string generated by the subtree at node

 lastpos(n)
 The set of positions that can match the last symbol of a string generated be the subtree at node

 followpos(i)
 The set of positions that can follow position in the tree.

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 69


Rules to compute nullable, firstpos, lastpos

Node n nullable(n) firstpos(n) lastpos(n)


A leaf labeled by true
A leaf with
false
position
firstpos(c1) lastpos(c1)
n
¿ nullable(c1)
or  
c1 c2 nullable(c2) firstpos(c2) lastpos(c2)
if (nullable(c1)) if (nullable(c2))
n . nullable(c1)
thenfirstpos(c1)  then lastpos(c1) 
c1
and
c2 firstpos(c2) lastpos(c2)
nullable(c2)
else firstpos(c1) else lastpos(c2)
n ∗
true firstpos(c1) lastpos(c1)
c1

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 70


Rules to compute followpos
1. If n is concatenation node with left child c1 and right child c2 and i is a position in
lastpos(c1), then all position in firstpos(c2) are in followpos(i)

2. If n is * node and i is position in lastpos(n), then all position in firstpos(n) are in followpos(i)

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 71


Conversion from regular expression to DFA
Step 1: Construct Syntax Tree
(a|b) * abb #
Step 2: Nullable node
False .
False . A leaf labeled by
¿𝟔 A leaf with position
False .
𝑏 False
False .
𝑏
𝟓
False
n
¿ nullable(c1)
or
𝟒 c1 c2 nullable(c2)
True ∗ 𝑎 False
𝟑 n ∗
False true
False ¿ c1

n nullable(c1)
𝑎 𝑏 . and
𝟏 𝟐 nullable(c2)
False False Here, * is only nullable node c1 c2
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 72
Conversion from regular expression to DFA

Step 3: Calculate firstpos


Firstpos
{1,2,3 } .
{1,2,3 } . A leaf with position
{6} ¿𝟔
{1,2,3 } .
{5} 𝑏
{1,2,3 } . 𝟓
n
¿ firstpos(c1)  firstpos(c2)
{4} 𝑏
c1 c2
𝟒
{1,2} ∗ {3} 𝑎 n ∗
𝟑 firstpos(c1)

{1,2} ¿ c1

n if (nullable(c1))
.
𝑎 𝑏 thenfirstpos(c1)  firstpos(c2)
{1} 𝟏 {2} 𝟐 c1 else firstpos(c1)
c2

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 73


Conversion from regular expression to DFA

Step 3: Calculate lastpos


Lastpos
{1,2,3 } . {6}
{1,2,3 } . {5}
{6} ¿{6} A leaf with position

{1,2,3 } . {4} 𝟔
{5} 𝑏{5}
{1,2,3 } . {3} {4} 𝑏{4} 𝟓
n
¿ lastpos(c1)  lastpos(c2)
c1 c2
𝟒
{1,2} ∗ {1,2} {3} 𝑎{3} n ∗
𝟑 lastpos(c1)

{1,2} ¿{1,2} n
c1

if (nullable(c2)) then
.
𝑎 𝑏 lastpos(c1)  lastpos(c2)
{1} 𝟏{1} {2} 𝟐{2} c1 c2 else lastpos(c2)

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 74


Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos Rule:


5 If n is * node and i is position in
Firstpos {1,2,3 } . {6} lastpos(n), then all position in
Lastpos 4 firstpos(n) are in followpos(i)
{1,2,3 } . {5} 3
{6} ¿{6}
{1,2,3 } . {4} 𝟔 2 1,2,
{5} 𝑏{5} 1 1,2,
{1,2,3 } . {3} {4} 𝑏{4} 𝟓
𝟒 {1,2} * {1,2}
{1,2} ∗ {1,2} {3} 𝑎{3} 𝒏
𝟑
{1,2} ¿{1,2}
𝑎 𝑏
{1} 𝟏{1} {2} 𝟐{2}

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 75


Conversion from regular expression to DFA

Step 4: Calculate followpos Rule:


Position followpos
If n is concatenation node
5 with left child c1 and right
Firstpos {1,2,3 } . {6} 4 child c2 and i is a position in
Lastpos lastpos(c1), then all position
{1,2,3 } . {5} 3
{6} ¿{6} in firstpos(c2) are in
1,2, 3 followpos(i)
{1,2,3 } . {4} 𝟔 2
{5} 𝑏{5} 1 1,2, 3
{1,2,3 } . {3} {4} 𝑏{4} 𝟓 .
𝟒
{1,2} ∗ {1,2} {3} 𝑎{3} {1,2} 𝒄𝟏{1,2} {3} 𝒄𝟐{3}
𝟑
{1,2} ¿{1,2}
𝑎 𝑏
{1} 𝟏{1} {2} 𝟐{2}

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 76


Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos Rule:


If n is concatenation node
5
with left child c1 and right
Firstpos {1,2,3 } . {6} 4 child c2 and i is a position in
Lastpos lastpos(c1), then all position
{1,2,3 } . {5} 3 4
{6} ¿{6} 2 1,2, 3
in firstpos(c2) are in
followpos(i)
{1,2,3 } . {4} 𝟔
{5} 𝑏{5} 1 1,2, 3
{1,2,3 } . {3} {4} 𝑏{4} 𝟓 .
𝟒
{1,2} ∗ {1,2} {3} 𝑎{3} {1,2,3 } 𝒄𝟏{3} {4} 𝒄𝟐{4}
𝟑
{1,2} ¿{1,2}
𝑎 𝑏
{1} 𝟏{1} {2} 𝟐{2}

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 77


Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos Rule:


If n is concatenation node
5
with left child c1 and right
Firstpos {1,2,3 } . {6} 4 5 child c2 and i is a position in
Lastpos lastpos(c1), then all position
{1,2,3 } . {5} 3 4
{6} ¿{6} 2 1,2, 3
in firstpos(c2) are in
followpos(i)
{1,2,3 } . {4} 𝟔
{5} 𝑏{5} 1 1,2, 3
{1,2,3 } . {3} {4} 𝑏{4} 𝟓 .
𝟒
{1,2} ∗ {1,2} {3} 𝑎{3} {1,2,3 } 𝒄𝟏{4} {5} 𝒄𝟐{5}
𝟑
{1,2} ¿{1,2}
𝑎 𝑏
{1} 𝟏{1} {2} 𝟐{2}

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 78


Conversion from regular expression to DFA

Step 4: Calculate followpos Position followpos Rule:


6 If n is concatenation node
5
with left child c1 and right
Firstpos {1,2,3 } . {6} 4 5 child c2 and i is a position in
Lastpos lastpos(c1), then all position
{1,2,3 } . {5} 3 4
{6} ¿{6} 2 1,2, 3
in firstpos(c2) are in
followpos(i)
{1,2,3 } . {4} 𝟔
{5} 𝑏{5} 1 1,2, 3
{1,2,3 } . {3} {4} 𝑏{4} 𝟓 .
𝟒
{1,2} ∗ {1,2} {3} 𝑎{3} {1,2,3 } 𝒄𝟏{5} {6} 𝒄𝟐{6}
𝟑
{1,2} ¿{1,2}
𝑎 𝑏
{1} 𝟏{1} {2} 𝟐{2}

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 79


Conversion from regular expression to DFA
Initial state = of root = {1,2,3} ----- A
Position followpos
State A
5 6
δ( (1,2,3),a) = followpos(1) U followpos(3) 4 5
=(1,2,3) U (4) = {1,2,3,4} ----- B 3 4
2 1,2,3

δ( (1,2,3),b) = followpos(2) 1 1,2,3

=(1,2,3) ----- A
States a b
A={1,2,3} B A
B={1,2,3,4}

Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 80


Conversion from regular expression to DFA
State B
Position followpos
δ( (1,2,3,4),a) = followpos(1) U followpos(3)
5 6
=(1,2,3) U (4) = {1,2,3,4} ----- B 4 5
3 4
δ( (1,2,3,4),b) = followpos(2) U followpos(4) 2 1,2,3

=(1,2,3) U (5) = {1,2,3,5} ----- C 1 1,2,3

State C
δ( (1,2,3,5),a) = followpos(1) U followpos(3) States a b
A={1,2,3} B A
=(1,2,3) U (4) = {1,2,3,4} ----- B
B={1,2,3,4} B C
C={1,2,3,5} B D
δ( (1,2,3,5),b) = followpos(2) U followpos(5) D={1,2,3,6}

=(1,2,3) U (6) = {1,2,3,6} ----- D


Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 81
Conversion from regular expression to DFA
State D
Position followpos
δ( (1,2,3,6),a) = followpos(1) U followpos(3)
5 6
=(1,2,3) U (4) = {1,2,3,4} ----- B 4 5
3 4
δ( (1,2,3,6),b) = followpos(2) 2 1,2,3

=(1,2,3) ----- A 1 1,2,3

b
a States a b
A={1,2,3} B A
a b b B={1,2,3,4} B C
A B C D
C={1,2,3,5} B D
a
a D={1,2,3,6} B A
b

DFA
Prof. Dixita B Kagathara #3170701 (CD)  Unit 2 – Lexical Analysis 82
An Elementary Scanner Design &
It’s Implementation
An Elementary Scanner Design & It’s Implementation
Tasks of Scanner
1. The main purpose of the scanner is to return the next input token to the parser. 
2. The scanner must identify the complete token and sometimes differentiate between keywords
and identifiers.
3. The scanner may perform symbol-table maintenance, inserting identifiers, literals, and constants
into the tables.
4. The scanner also eliminate the white spaces.
Regular Expression: Tokens can be Specified using regular expression.
Example: id letter(letter | digit)*

Transition Diagram: Finite-state diagrams or transition diagrams are often used to recognize a


token
digit digit

1
. 2
digit
3
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701 (PS)
(CD)  Unit
Unit12––Basic
Lexical
Probability
Analysis 84
Implementation of Lexical Analyzer (Lex)
 Lex is tool or a language which is useful for generating a lexical Analyzer and it specifies the
regular expression
 Regular expression is used to represent the patterns for a token.
Creating Lexical Analyzer with LEX
Source
Lex Compiler lex.yy.c
Program
(lex.l)

lex.yy.c C Compiler a.out Lexical Analyzer

Input Sequence
a.out
Stream of token

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)   Unit
Unit1 2– –Basic
Lexical
Probability
Analysis 85
Structure of Lex Program
 Any lex program contains mainly three sections
1. Declaration
2. Translation rules
3. Auxiliary Procedures

Structure of Program
Declaration It is used to declare variables, constant & regular definition
Syntax: Pattern {Action}
Example:
Example:
%{
%% %%
int x,y;
Translation rule pattern1 {Action1}
float rate;
%% %}
pattern2 {Action2}
pattern3 {Action3}
Digit [0-9]
%%
Auxiliary Procedures Letter [A-ZAll a-Z]
the function needed are specified over here.

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701 (PS)
(CD)  Unit
Unit12––Basic
Lexical
Probability
Analysis 86
Example: Lex Program
 Program: Write Lex program to recognize identifier, keywords, relational operator and numbers

/* Declaration */ /* Translation rule */


%{ %%
/* Lex program for recognizing tokens */ {Id} {printf(“%s is an identifier”,yytext);}
%} If {printf(“%s is a keyword”,yytext);}
Letter [a-z A-z] else {printf(“%s is a keyword”,yytext);}
Digit [0-9] “<” {printf(“%s is a less then operator”,yytext);}
Id {Letter}({Letter}|{Digit})* “>=” {printf(“%s is a greater then equal to operator”,yytext);}
Numbers {Digit}+ (.{Digit}+)? (E[+ -]? Digit+)? {Numbers} {printf(“%s is a number”,yytext);}
%%
/* Auxiliary Procedures */
install_id() Input string: If year < 2021
{
/* procedure to lexeme into the symbol table and return a pointer */
}
Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)   Unit
Unit1 2– –Basic
Lexical
Probability
Analysis 87
References
Books:
1. Compilers Principles, Techniques and Tools, PEARSON Education (Second Edition)
Authors: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman
2. Compiler Design, PEARSON (for Gujarat Technological University)
Authors: Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman

Prof. Dixita
Jay R Dhamsaniya
B Kagathara #3130006
#3170701(PS)
(CD)   Unit
Unit1 2– –Basic
Lexical
Probability
Analysis 88
Thank You

You might also like