CD Unit-2

UNIT-2
LEXICAL ANALYZER
1
Introduction
 Lexical analysis is the first phase of a compiler. It takes the
modified source code from language preprocessors that are
written in the form of sentences. The lexical analyzer breaks these
syntaxes into a series of tokens, by removing any whitespace or
comments in the source code.
 If the lexical analyzer finds a token invalid, it generates an error.

The lexical analyzer works closely with the syntax analyzer. It reads
character streams from the source code, checks for legal tokens,
and passes the data to the syntax analyzer when it demands.
2
The Architecture of Lexical Analyzer
• To read the input character in the source code and produce a token is the most
important task of a lexical analyzer. The lexical analyzer goes through with the entire
source code and identifies each token one by one.
• The scanner is responsible to produce tokens when it is requested by the parser.
The lexical analyzer avoids the whitespace and comments while creating these
tokens. If any error occurs, the analyzer correlates these errors with the source file
and line number.
3
Roles and Responsibility of Lexical Analyzer
• The lexical analyzer is responsible for removing the white spaces and
comments from the source program.
• It corresponds to the error messages with the source program.
• It helps to identify the tokens.
• The input characters are read by the lexical analyzer from the source
code.
4
Token, Pattern & Lexemes
Token Pattern
Sequence of character having a The set of rules called pattern associated
collective meaning is known as with a token.
token. Example: “non-empty sequence of digits”,
Categories of Tokens: “letter followed by letters and digits”
1. Identifier Lexemes
2. Keyword The sequence of character in a source
3. Operator program matched with a pattern for a
4. Special symbol token is called lexeme.
5. Constant Example: Rate, LDRP, count, Flag
5
Token, Pattern & Lexemes (Example)
Example: total = sum + 45
Tokens
Identifier1
total
Operator1
=
Identifier2 Tokens
sum
+ Operator2
Constant1
45
Lexemes
Lexemes of identifier: total, sum
Lexemes of operator: =, +
Lexemes of constant: 45
6
Attributes for Tokens
 When more than one lexeme can match a pattern, the lexical analyzer
must provide the subsequent compiler phases additional information
about the particular lexeme that matched.
 In many cases the lexical analyzer returns to the parser not only a token
name, but an attribute value that describes the lexeme represented by
the token; the token name influences parsing decisions, while the
attribute value influences translation of tokens after the parse.
 The token names and associated attribute values for the given statement:
 E = M * C ** 2
• <id, pointer to symbol table entry for E>
• <assign-op>
• <id, pointer to symbol table entry for M>
• <mult-op>
• <id, pointer to symbol table entry for C>
• <exp-op>
• <number, integer value 2>
7
Lexical Errors
 Lexical error occurs when a sequence of characters does not
match the pattern of any token.
 Types of Lexical Error:
1. Exceeding length of identifier or numeric constants.
2. Appearance of illegal characters
3. Unmatched string
4. Spelling Error
5. Replacing a character with an incorrect character.
6. Removal of the character that should be present.
7. Transposition of two characters.
8
Input buffering
 Lexical Analysis has to access secondary memory each time to
identify tokens. It is time-consuming and costly. So, the input
strings are stored into a buffer and then scanned by Lexical
Analysis.
 Lexical Analysis scans input string from left to right one character
at a time to identify tokens. It uses two pointers to scan tokens −
 Begin Pointer (bptr) − It points to the beginning of the string to be
read.
 Look Ahead Pointer (lptr) − It moves ahead to search for the end
of the token.
9
Input buffering - Example
10
Input buffering – Example (Continue…)
11
Input buffering
 A buffer can be divided into two halves. If the look Ahead pointer
moves towards halfway in First Half, the second half is filled with
new characters to be read. If the look Ahead pointer moves
towards the right end of the buffer of the second half, the first half
will be filled with new characters, and it goes on.
12
Input buffering Techniques
 Sentinels − Sentinels are used to making a check, each time when the
forward pointer is converted, a check is completed to provide that one
half of the buffer has not converted off. If it is completed, then the other
half should be reloaded.
 Buffer Pairs − A specialized buffering technique can decrease the amount
of overhead, which is needed to process an input character in
transferring characters. It includes two buffers, each includes N-character
size which is reloaded alternatively.
 There are two pointers such as the lexeme Begin and forward are
supported. Lexeme Begin points to the starting of the current lexeme
which is discovered. Forward scans ahead before a match for a pattern
are discovered. Before a lexeme is initiated, lexeme begin is set to the
character directly after the lexeme which is only constructed, and
forward is set to the character at its right end.
13
Specification of tokens
 There are 3 specifications of tokens:
1) Strings: Strings are a finite set of symbols or characters. These symbols can
be a digit or an alphabet. There is also an empty string which is denoted
by ε.
2) Language: A language can be defined as a set of strings over some symbols
or alphabets.
3) Regular expression: Regular expressions are strings of characters that
define a searching pattern with the help of which we can form a language,
and each regular expression represents a language.
14
Operations on Strings
Term Definition
Prefix of s A string obtained by removing zero or more trailing symbol of
string S.
e.g., ban is prefix of banana.
Suffix of S A string obtained by removing zero or more leading symbol of
string S.
e.g., nana is suffix of banana.
Sub string of S A string obtained by removing prefix and suffix from S.
e.g., nan is substring of banana
Proper prefix, suffix Any nonempty string x that is respectively proper prefix, suffix or
and substring of S substring of S, such that s≠x.
Subsequence of S A string obtained by removing zero or more not necessarily

contiguous symbol from S.
e.g., baaa is subsequence of banana.
15
Operations on languages
Operation Definition
Union of L and M
Written L U M
Concatenation of L
and M
Written LM
Kleene closure of L
Written L∗
Positive closure of L
Written L+
16
Regular expression
 The lexical analyzer needs to scan and identify only a finite set of
valid string/token/lexeme that belong to the language in hand. It
searches for the pattern defined by the language rules.
 Regular expressions have the capability to express finite languages
by defining a pattern for finite strings of symbols.
 The grammar defined by regular expressions is known as regular
grammar. The language defined by regular grammar is known
as regular language.
17
Regular expression
 Notations
• If r and s are regular expressions denoting the languages L(r) and L(s), then
• Union : (r)|(s) is a regular expression denoting L(r) U L(s)
• Concatenation : (r)(s) is a regular expression denoting L(r)L(s)
• Kleene closure : (r)* is a regular expression denoting (L(r))*
• (r) is a regular expression denoting L(r)
 Precedence and Associativity
• *, concatenation (.), and | (pipe sign) are left associative
• * has the highest precedence
• Concatenation (.) has the second highest precedence.
• | (pipe sign) has the lowest precedence of all.
18
Rules to define regular expression
1. is a regular expression that denotes , the set containing empty
string.
2. If is a symbol in then is a regular expression,
3. Suppose and are regular expression denoting the languages and .
Then,
a. is a regular expression denoting
b. is a regular expression denoting
c. * is a regular expression denoting
d. is a regular expression denoting
The language denoted by regular expression is said to be a regular
set.
19
Regular definition
 A regular definition gives names to certain regular expressions and
uses those names in other regular expressions.
 Regular definition is a sequence of definitions of the form:
……
Where is a distinct name &is a regular expression.

 Example: Regular definition for identifier
letter A|B|C|………..|Z|a|b|………..|z
digit  0|1|…….|9|
id letter (letter | digit)*
20
Extensions of Regular Expressions
 Many extensions have been added to regular expressions to enhance their
ability to specify string patterns.
1. One or more instances (+): - The unary postfix operator + means “ one or more
instances of”.
• If r is a regular expression that denotes the language L(r), then ( r ) + is a regular expression
that denotes the language (L (r ))+
• The operator + has the same precedence and associativity as the operator *.
2. Zero or one instance (?): The notation r? is a shorthand for r | ε. If ‘r’ is a
regular expression, then ( r )? is a regular expression that denotes the language
3. Character Classes: It is a way of representing multiple characters.
• The notation [abc] where a, b and c are alphabet symbols denotes the regular expression
a | b | c.
• Character class such as [a – z] denotes the regular expression a | b | c | d | ….|z.
• We can describe identifiers as being strings generated by the regular expression, [A–Za–z]
[A– Za–z0–9]*
21
Recognition of tokens
 We must study how to take the patterns for all the needed tokens
and build a piece of code that examines the input string and finds
a prefix that is a lexeme matching one of the patterns.
 Starting point is the language grammar to understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
22
Recognition of tokens (cont.)
 The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
 We also need to handle whitespaces:
ws -> (blank | tab | newline)+
23
Transition diagrams
Transition diagram for relop
24
Transition diagrams
Transition diagram for reserved words and identifiers
Transition diagram for reserved whitespace
25
Architecture of a transition-diagram-based lexical analyzer
TOKEN getRelop()
{
TOKEN retToken = new (RELOP)
while (1) { /* repeat character processing until a
return or failure occurs */
switch(state) {
case 0: c= nextchar();
if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else fail(); /* lexeme is not a relop */
break;
case 1: …
…
case 8: retract();
retToken.attribute = GT;
return(retToken);
}
26
Lexical Analyzer Generator - Lex
Use of Lex
• Lex is a program that generates
lexical analyzer. It is used with
YACC parser generator.
• The lexical analyzer is a
program that transforms an
input stream into a sequence of
tokens.
• It reads the input stream and
produces the source code as
output through implementing
the lexical analyzer in the C
program.
Structure of Lex programs
 A Lex program is separated into three sections by %% delimiters. The formal of Lex
source is as follows:
{ definitions }
%%
{ rules }
%%
{ user subroutines }
 Definitions include declarations of constant, variable and regular definitions.
 Rules define the statement of form p1 {action1} p2 {action2}....pn {action}.
Where pi describes the regular expression and action1 describes the actions what
action the lexical analyzer should take when pattern pi matches a lexeme.
 User subroutines are auxiliary procedures needed by the actions. The subroutine
can be loaded with the lexical analyzer and compiled separately.
28
Example
%{
/* definitions of manifest constants Int installID() {/* funtion to
LT, LE, EQ, NE, GT, GE,
IF, THEN, ELSE, ID, NUMBER, RELOP */ install the lexeme, whose first
%} character is pointed to by
/* regular definitions
delim [ \t\n] yytext, and whose length is
ws {delim}+ yyleng, into the symbol table
letter[A-Za-z]
digit [0-9] and return a pointer thereto */
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)? }
%%
{ws} {/* no action and no return */} Int installNum() { /* similar to
if {return(IF);}
then {return(THEN);} installID, but puts numerical
else {return(ELSE);} constants into a separate table
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);} */
}
Finite Automata
 A finite automaton (FA) is a simple idealized machine used to

recognize patterns within input taken from some character set (or
alphabet) C. The job of an FA is to accept or reject an input
depending on whether the pattern defined by the FA occurs in the
input.
 A Finite Automata consists of the following :
• Q : Finite set of states
• Σ : set of Input Symbols
• q0 : Initial state.
• F : set of Final States
• δ : Transition Function
 Formal specification of machine is:
{ Q, Σ, q0, F, δ }
30
Types of finite automata
 There are two types of finite automata:
 Deterministic finite automata (DFA): have for each state exactly
one edge leaving out for each symbol.
 Nondeterministic finite automata (NFA): There are no restrictions
on the edges leaving a state. There can be several with the same
symbol as label and some edges can be labeled with .
b
a
a b b a b b
1 2 3 4 1 2 3 4
a
a
b a
b
DFA NFA
31
Regular expression to NFA (Thompson’s construction)
1. For , construct the NFA
start
𝑖 𝑓
𝜖
2. For in , construct the NFA
𝑖 𝑓
start a
32
3. For regular expression
N(s) 𝜖
𝜖
𝑖
start
𝑓
𝜖 N(t) 𝜖
Ex: (a|b)
a
2 3
𝜖 𝜖
1 6
𝜖 𝜖
4 5
b
33
4. For regular expression
𝑖 N(s) 𝑓
start
N(t)
Ex: ab
a b
1 2 3
34
5. For regular expression * 𝜖
𝜖 𝜖
𝑖
start
N(s)
𝑓
𝜖
𝜖
Ex: a*
𝜖 𝑎 𝜖
1 2 3 4
𝜖
35
Regular expression to NFA examples
𝜖
• a*b
𝜖 𝑎 𝜖
1 2 3 4 b
5
𝜖
• b*ab
𝜖
𝜖 𝑏 𝜖 𝑎 𝑏
1 2 3 4 5 6
𝜖
36
Regular expression to NFA examples
•(c|d) c
2 3
𝜖 𝜖
1 6
𝜖 𝜖
4 5
d
• (c|d)* 𝜖
c
2 3
𝜖 𝜖
𝜖 𝜖
0 1 6 7
𝜖 𝜖
4 5
d
𝜖
37
Conversion from NFA to DFA using
subset construction method
38
Subset construction algorithm
initially be the only state in and it is unmarked;
while there is unmarked states T in do begin
mark ;
for each input symbol do begin
if is not in then
add as unmarked state to
end
end
39
Conversion from NFA to DFA
(a|b)* abb 𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
40
(a|b)* abb 𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
𝜖- Closure(0)= {0, 1, 7, 2, 4}
= {0,1,2,4,7} ---- A
41
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B
A= {0, 1, 2, 4, 7} B = {1,2,3,4,6,7,8}
Move(A,a) = {3,8}
𝜖- Closure(Move(A,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
42
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
A= {0, 1, 2, 4, 7} B = {1,2,3,4,6,7,8}
C = {1,2,4,5,6,7}
Move(A,b) = {5}
𝜖- Closure(Move(A,b)) = {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
43
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
B = {1, 2, 3, 4, 6, 7, 8} B = {1,2,3,4,6,7,8} B
C = {1,2,4,5,6,7}
Move(B,a) = {3,8}
𝜖- Closure(Move(B,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
44
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
B= {1, 2, 3, 4, 6, 7, 8} B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7}
Move(B,b) = {5,9}
D = {1,2,4,5,6,7,9}
𝜖- Closure(Move(B,b)) = {5, 6, 7, 1, 2, 4, 9}
= {1,2,4,5,6,7,9} ---- D
45
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
C= {1, 2, 4, 5, 6 ,7} B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B
Move(C,a) = {3,8}
D = {1,2,4,5,6,7,9}
𝜖- Closure(Move(C,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
46
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
C= {1, 2, 4, 5, 6, 7} B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
Move(C,b) = {5}
D = {1,2,4,5,6,7,9}
𝜖- Closure(Move(C,b))= {5, 6, 7, 1, 2, 4}
= {1,2,4,5,6,7} ---- C
47
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
D= {1, 2, 4, 5, 6, 7, 9} B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
Move(D,a) = {3,8}
D = {1,2,4,5,6,7,9} B
𝜖- Closure(Move(D,a)) = {3, 6, 7, 1, 2, 4, 8}
= {1,2,3,4,6,7,8} ---- B
48
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
D= {1, 2, 4, 5, 6, 7, 9} B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
Move(D,b)= {5,10}
D = {1,2,4,5,6,7,9} B E
𝜖- Closure(Move(D,b)) = {5, 6, 7, 1, 2, 4, 10}
E = {1,2,4,5,6,7,10}
= {1,2,4,5,6,7,10} ---- E
49
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
E= {1, 2, 4, 5, 6, 7, 10} B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
Move(E,a) = {3,8}
D = {1,2,4,5,6,7,9} B E
𝜖- Closure(Move(E,a)) = {3, 6, 7, 1, 2, 4, 8}
E = {1,2,4,5,6,7,10} B
= {1,2,3,4,6,7,8} ---- B
50
𝜖
a
2 3
𝜖 𝜖
𝜖 𝜖 a b b
0 1 6 7 8 9 10
𝜖 𝜖
4 5
b
States a b
𝜖 A = {0,1,2,4,7} B C
E= {1, 2, 4, 5, 6, 7, 10} B = {1,2,3,4,6,7,8} B D
C = {1,2,4,5,6,7} B C
Move(E,b)= {5}
D = {1,2,4,5,6,7,9} B E
𝜖- Closure(Move(E,b))= {5,6,7,1,2,4}
E = {1,2,4,5,6,7,10} B C
= {1,2,4,5,6,7} ---- C
51
DFA
a
b
States
A = {0,1,2,4,7}
a
B
b
C
a 𝐵 a
𝐷
D
𝐴
B = {1,2,3,4,6,7,8} B D
a a b
C = {1,2,4,5,6,7} B C
D = {1,2,4,5,6,7,9}
E = {1,2,4,5,6,7,10}
B
B
E
C
b
𝐶 b
𝐸
Transition Table
b
Note:
• Accepting state in NFA is 10
DFA
• 10 is element of E
• So, E is acceptance state in DFA
52
DFA Optimization Algorithm
1. Construct an initial partition of the set of states with two groups: the
accepting states and the non-accepting states .
2. Apply the repartition procedure to to construct a new partition .
3. If , let and continue with step (4). Otherwise, repeat step (2) with .
for each group of do begin
partition into subgroups such that two states and
of are in the same subgroup if and only if for all
input symbols , states and have transitions
on
to states in the same group of .
replace in by the set of all subgroups formed.
end
53
DFA Optimization
{ 𝐴, 𝐵,𝐶, 𝐷, 𝐸} States a b
A B C
Accepting States B B D
Nonaccepting States
C B C
D B E
{ 𝐴, 𝐵,𝐶} {𝐷} E B C
𝐴, 𝐶 {} 𝐵} States
A
a
B
b
A
B B D
• Now no more splitting is possible.
D B E
• If we chose A as the representative for group E B A
(AC), then we obtain reduced transition table Optimized
Transition Table
54
Conversion from regular
expression to DFA
55
Conversion from regular expression to DFA
 Regular expression is represented as syntax tree where interior nodes correspond
to operators representing union, concatenation and closure operations.
 Leaf nodes corresponds to the input symbols
 Construct DFA directly from a regular expression by computing the
functions nullable(n), firstpos(n), lastpos(n) and followpos(i) from the syntax tree.
 nullable (n): Is true for * node and node labeled with Ɛ. For other nodes it is false.
 firstpos (n): Set of positions at node ti that corresponds to the first symbol of the
sub-expression rooted at n.
 lastpos (n): Set of positions at node ti that corresponds to the last symbol of the
sub-expression rooted at n.
 followpos (i): Set of positions that follows given position by matching the first or
last symbol of a string generated by sub-expression of the given regular
expression.
56
Rules for computing nullable, firstpos and lastpos
Node n nullable (n) firstpos (n) lastpos (n)
A leaf labeled Ɛ True Ø Ø
A leaf with False {i} {i}

position I
An or node n = Nullable (c1 ) or firstpos (c1) U Iastpos (c1) U

c1| c 2 Nullable (c2 ) firstpos (c2) Iastpos (c2)
A cat node n = Nullable (c1 ) and If (Nullable (c1 )) If (Nullable (c2 ))

c1c2 Nullable (c2 ) firstpos (c1) U lastpos (c1) U
firstpos (c2) Iastpos (c2)
else else
firstpos (c1) lastpos (c2)
A star node n = True firstpos (c1) lastpos (c1)

c1*
57
Rules to compute followpos
 The position of regular expression can follow another in the
following ways:
 If n is a cat node with left child c1 and right child c2, then for
every position i in lastpos(c1), all positions in firstpos(c2) are
in followpos(i).
 For cat node, for each position i in lastpos of its left
child, the firstpos of its right child will be in followpos(i).
 If n is a star node and i is a position in lastpos(n), then all
positions in firstpos(n) are in followpos(i).
 For star node, the firstpos of that node is in followpos of all
positions in lastpos of that node.
58
Regular Expression ( a | b )* a b b #
 Syntax Tree for given RE:
#
6
b
nullable 5
b
4
* a
3
|
a b
1 2
59
Regular Expression ( a | b )* a b b #
{1, 2, 3} {6}
{1, 2, 3} {5} {6} # {6}

6
{1, 2, 3} {4} {5} b {5}
nullable 5
{1, 2, 3} {3} {4} b {4}
4
{3} a {3}
firstpos lastpos
{1, 2}
* {1, 2} 3
{1, 2} | {1, 2}
{1} a {1} {2} b {2}
60
Construct DFA
Initial state = of root = {1,2,3} ----- A Position followpos
5 6
State A
4 5
δ( (1,2,3),a) =followpos(1) Ufollowpos(3)
3 4
=(1,2,3)U (4) = {1,2,3,4} ----- B 2 1,2,3
1 1,2,3
δ( (1,2,3),b) = followpos(2)
=(1,2,3) ----- A States a b
A={1,2,3} B A
B={1,2,3,4}
61
Construct DFA
State B Position followpos
δ( (1,2,3,4),a) = followpos(1) U followpos(3) 5 6

4 5
=(1,2,3) U (4) = {1,2,3,4} ----- B
3 4
2 1,2,3
δ( (1,2,3,4),b) = followpos(2) U followpos(4) 1 1,2,3
=(1,2,3) U (5) = {1,2,3,5} ----- C
State C States a b
δ( (1,2,3,5),a) = followpos(1) U followpos(3) A={1,2,3} B A
B={1,2,3,4} B C
=(1,2,3) U (4) = {1,2,3,4} ----- B
C={1,2,3,5} B D
D={1,2,3,6}
δ( (1,2,3,5),b) = followpos(2) U followpos(5)
=(1,2,3) U (6) = {1,2,3,6} ----- D
62
Construct DFA
State D Position followpos
5 6
δ( (1,2,3,6),a) = followpos(1) U followpos(3)
4 5
=(1,2,3) U (4) = {1,2,3,4} ----- B
3 4
2 1,2,3
δ( (1,2,3,6),b) = followpos(2) 1 1,2,3
=(1,2,3) ----- A
b States a b
a
A={1,2,3} B A
a b b B={1,2,3,4} B C
A B C D
C={1,2,3,5} B D
a
a D={1,2,3,6} B A
b
DFA
63
Construct DFA
Position followpos
5 6
b
a 4 5
3 4
a b b 2 1,2,3
A B C D
1 1,2,3
a
a
b
States a b
DFA
A={1,2,3} B A
B={1,2,3,4} B C
C={1,2,3,5} B D
D={1,2,3,6} B A
64

CD Unit-2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CD Unit-2

Uploaded by

Copyright:

Available Formats

UNIT-2

 If the lexical analyzer finds a token invalid, it generates an error.

Subsequence of S A string obtained by removing zero or more not necessarily

Where is a distinct name &is a regular expression.

Transition diagram for reserved whitespace

 A finite automaton (FA) is a simple idealized machine used to

2. For in , construct the NFA

A leaf with False {i} {i}

An or node n = Nullable (c1 ) or firstpos (c1) U Iastpos (c1) U

A cat node n = Nullable (c1 ) and If (Nullable (c1 )) If (Nullable (c2 ))

A star node n = True firstpos (c1) lastpos (c1)

 Syntax Tree for given RE:

{1, 2, 3} {5} {6} # {6}

{1} a {1} {2} b {2}

δ( (1,2,3,4),a) = followpos(1) U followpos(3) 5 6

You might also like